Counting species - questions and meta-questions

by Kevin Thiele

Yet another paper has come out (Larsen et al. "Inordinate fondness multiplied and redistributed: The number of species on earth and a new pie of life" The Quarterly Review of Biology 92(3): 229-265, 2017) asking the perennial "how many species are there on earth" question.

This is potentially an important question, and potentially a non-question. The question (whether it's important or not a question) is in turn important for the decadal plan, but also more broadly for biology as a whole. I'll return to the decadal plan later.

This issue is a problem, because current estimates for the number of species on earth vary from ~2 million (see refs in paper above) to ~1 trillion (I don't even really know how big a trillion is, but it's much bigger than 2 million). The paper above takes a stab in the dark (the authors would dispute this) and puts the figure at 1–6 billion.

Take your pick. That's our problem. The number can be almost anything you want it to be.

But I think there's a bigger problem, which is that none of the studies that make these estimates ask what I think is the most important question, which is: does the how-many-species question make any sense? (Or more precisely, is the question answerable? The studies assume that it is, without justifying this assumption.)

This is a meta-question, a question about a question. Until we can answer the meta-question, trying to answer the question is almost certainly futile. Let me explain; but first, let me digress to the late 18th Century.

At that time, some of the most influential French scientists (Jussieu, Adanson, De Candolle) had an important argument about the "shape" of nature. Jussieu (one of the leading scientists in the post-Linnaean world) argued from first principles that nature was continuous. He believed that somewhere out there (and increasingly being discovered during the great age of exploration) existed an intermediate form between every recognised taxon. There would be found organisms that would bridge the apparent gap between all species, between all genera, all families, all orders etc. Nature, in Jussieu's view, would prove to be a complete continuum, and taxonomy would eventually become an utterly arbitrary division of that continuum, just as colour terms arbitrarily divide the spectrum of visible light. Jussieu, by the way, was perfectly comfortable with this.

Adanson and De Candolle, by contrast, believed that the gaps observed between clusters of closely similar organisms were real, and that a relatively non-arbitrary ("natural") taxonomy could be based on the identification of these gaps. What's more, they believed that the cluster-and-gaps pattern could be discerned at all taxonomic levels, allowing us to create a natural classification of species, genera, families etc.

Adanson and De Candolle won. Nature was found to be inherently gappy; they had invented a (non-algorithmic) form of the phenetic method; and a century-and-a-half of a "taxonomy of the gaps" ensued.

Phylogenetics has slightly changed our views on all this, but only slightly. We're now interested in clades and all that, of course, but (at least at species level) we're still very keen on gaps. We now use terms like coalescence; the issue may take the form of working out what percentage difference between two barcodes is required to infer two species; it's still about gaps.

But - what if Jussieu was right? Not exactly right in the sense that there is a continuum of forms, but right in the sense that there's a continuum of gaps. What if there are big gaps (between e.g. a tuatara and its nearest relatives) down to small gaps (between two closely realted species) to smaller gaps (perhaps between "cryptic species") to very small gaps (the ones that the next generation of taxonomic splitters may use to ensure that taxonomy is a never-ending science, and that some orchid taxonomists use today - sorry, couldn't resist the dig).

What does this mean to our question "how many species are there"? It may mean that the answer is whatever number you want it to be. Curiously, that seems to be about where we're at.

The problem can be rephrased in modern terms: is the pattern of variation in nature (call it its shape) fractal? A fractal pattern would be one where the pattern of "gappiness" is about the same all the way down. The gaps become finer and finer, but we can discern gaps all the way. If nature is fractal in this sense, then asking the question "how many species are there" is as meaningless as the classic fractal example "how long is the coastline of Australia?" There's no answer to that question. If you measure Australia's coastline on a 1:1,000,000 map you'll come up with one estimate; if you trace around every headland and minor prominence you'll get a much larger answer; if you trace around every grain of sand on every beach you'll get a larger answer still. In a fractal system, some questions are silly.

If, however, the shape of nature is non-fractal and there's a minimum observable gap, which we could use to objectively delimit species, then the question isn't silly at all (it's merely difficult).

So - I think we need to answer the meta-question ("Is the pattern of variation in nature such that the question of how many species exist is answerable?") before we try to answer the question ("how many species exist?"). An important question is - how could we go about answering the meta-question?

A thought experiment may help. Imagine that we had a full genome sequence of every individual organism on earth (no, I'm not suggesting this as a goal for the decadal plan). We could then use a super-super-computer to calculate the pairwise distances of every individual from every other individual, and plot these on a graph (increasing distance on the x-axis, frequency of that distance value on the y-axis). There would be a wide spread of pairwide distances on our plot, from close to zero to some arbitrarily large number.

If the shape of nature is fractal, we'd see a complete spread of distance values with only random troughs and peaks; if, however, there's a real "species-gap", we'd see a distinct, non-random dip in the frequency distribution at some distance value somewhere closeish to the x-origin.

Our dataset would allow more sophisticated analyses. We could partition the data into different taxonomic groups (do we see a species-gap in, say, spiders as well as in bacteria, birds and plants - and importantly, if we do is it in the same place?). We could also partition into different ecological niches (do rainforest taxa have a gap in the same place as arid-zone taxa?; do r-strategists have a gap in the same place as K-strategists?), or breeding systems (do taxa that use sexual selection have a gap in the same place as taxa that don't?).

When you think about it, a graph like this would give us crucial insights, not only into the meta-question discussed here, but to help assess utility of e.g. barcodes for species delimitation.

For what it's worth, my own guess is that we wouldn't see a magic value on a graph like this, but rather a random pattern of peaks and troughs all the way down. That is, my guess is that the question "how many species are there?" is a silly question.

Of course, like all good thought experiments, we could never do this. So this opens a new question - can we approximate the graph using real-world data sets? One possibility may be to use environmental genomic data - this has the advantage that it's presumably sampling sequences from every individual in the genomic soup, with no inherent taxonomic bias or pre-assumed taxonomy. I have no idea whether this idea has merit, and would be pleased to hear from someone who actually knows what they're talking about in this space.

One final question - do we try to deal with this issue in the decadal plan? We need to be careful about admitting that we have absolutely no idea how many species are in Australia and any estimate could be out by many orders of magnitude (this is not a great starting point for asking for funding to document our biodiversity). But we could argue for a project that addresses the meta-question, if indeed there's a way to address it. Now that would be a world scoop, I reckon.

As always, thoughts and comments very welcome.

noto|biotica

Australasian taxonomy and systematics

Counting species - questions and meta-questions