Counting species - questions and meta-questions

Yet another paper has come out (Larsen et al. "Inordinate fondness multiplied and redistributed: The number of species on earth and a new pie of life" The Quarterly Review of Biology 92(3): 229-265, 2017) asking the perennial "how many species are there on earth" question.

This is potentially an important question, and potentially a non-question. The question (whether it's important or not a question) is in turn important for the decadal plan, but also more broadly for biology as a whole. I'll return to the decadal plan later.

This issue is a problem, because current estimates for the number of species on earth vary from ~2 million (see refs in paper above) to ~1 trillion (I don't even really know how big a trillion is, but it's much bigger than 2 million). The paper above takes a stab in the dark (the authors would dispute this) and puts the figure at 1–6 billion. 

Take your pick. That's our problem. The number can be almost anything you want it to be.

But I think there's a bigger problem, which is that none of the studies that make these estimates ask what I think is the most important question, which is: does the how-many-species question make any sense? (Or more precisely, is the question answerable? The studies assume that it is, without justifying this assumption.) 

This is a meta-question, a question about a question. Until we can answer the meta-question, trying to answer the question is almost certainly futile. Let me explain; but first, let me digress to the late 18th Century.

At that time, some of the most influential French scientists (Jussieu, Adanson, De Candolle) had an important argument about the "shape" of nature. Jussieu (one of the leading scientists in the post-Linnaean world) argued from first principles that nature was continuous. He believed that somewhere out there (and increasingly being discovered during the great age of exploration) existed an intermediate form between every recognised taxon. There would be found organisms that would bridge the apparent gap between all species, between all genera, all families, all orders etc. Nature, in Jussieu's view, would prove to be a complete continuum, and taxonomy would eventually become an utterly arbitrary division of that continuum, just as colour terms arbitrarily divide the spectrum of visible light. Jussieu, by the way, was perfectly comfortable with this.

Adanson and De Candolle, by contrast, believed that the gaps observed between clusters of closely similar organisms were real, and that a relatively non-arbitrary ("natural") taxonomy could be based on the identification of these gaps. What's more, they believed that the cluster-and-gaps pattern could be discerned at all taxonomic levels, allowing us to create a natural classification of species, genera, families etc.

Adanson and De Candolle won. Nature was found to be inherently gappy; they had invented a (non-algorithmic) form of the phenetic method; and a century-and-a-half of a "taxonomy of the gaps" ensued.

Phylogenetics has slightly changed our views on all this, but only slightly. We're now interested in clades and all that, of course, but (at least at species level) we're still very keen on gaps. We now use terms like coalescence; the issue may take the form of working out what percentage difference between two barcodes is required to infer two species; it's still about gaps.

But - what if Jussieu was right? Not exactly right in the sense that there is a continuum of forms, but right in the sense that there's a continuum of gaps. What if there are big gaps (between e.g. a tuatara and its nearest relatives) down to small gaps (between two closely realted species) to smaller gaps (perhaps between "cryptic species") to very small gaps (the ones that the next generation of taxonomic splitters may use to ensure that taxonomy is a never-ending science, and that some orchid taxonomists use today - sorry, couldn't resist the dig). 

What does this mean to our question "how many species are there"? It may mean that the answer is whatever number you want it to be. Curiously, that seems to be about where we're at.

The problem can be rephrased in modern terms: is the pattern of variation in nature (call it its shape) fractal? A fractal pattern would be one where the pattern of "gappiness" is about the same all the way down. The gaps become finer and finer, but we can discern gaps all the way. If nature is fractal in this sense, then asking the question "how many species are there" is as meaningless as the classic fractal example "how long is the coastline of Australia?" There's no answer to that question. If you measure Australia's coastline on a 1:1,000,000 map you'll come up with one estimate; if you trace around every headland and minor prominence you'll get a much larger answer; if you trace around every grain of sand on every beach you'll get a larger answer still. In a fractal system, some questions are silly.

If, however, the shape of nature is non-fractal and there's a minimum observable gap, which we could use to objectively delimit species, then the question isn't silly at all (it's merely difficult).

So - I think we need to answer the meta-question ("Is the pattern of variation in nature such that the question of how many species exist is answerable?") before we try to answer the question ("how many species exist?"). An important question is - how could we go about answering the meta-question?

A thought experiment may help. Imagine that we had a full genome sequence of every individual organism on earth (no, I'm not suggesting this as a goal for the decadal plan). We could then use a super-super-computer to calculate the pairwise distances of every individual from every other individual, and plot these on a graph (increasing distance on the x-axis, frequency of that distance value on the y-axis). There would be a wide spread of pairwide distances on our plot, from close to zero to some arbitrarily large number. 

If the shape of nature is fractal, we'd see a complete spread of distance values with only random troughs and peaks; if, however, there's a real "species-gap", we'd see a distinct, non-random dip in the frequency distribution at some distance value somewhere closeish to the x-origin.

Our dataset would allow more sophisticated analyses. We could partition the data into different taxonomic groups (do we see a species-gap in, say, spiders as well as in bacteria, birds and plants - and importantly, if we do is it in the same place?). We could also partition into different ecological niches (do rainforest taxa have a gap in the same place as arid-zone taxa?; do r-strategists have a gap in the same place as K-strategists?), or breeding systems (do taxa that use sexual selection have a gap in the same place as taxa that don't?).

When you think about it, a graph like this would give us crucial insights, not only into the meta-question discussed here, but to help assess utility of e.g. barcodes for species delimitation. 

For what it's worth, my own guess is that we wouldn't see a magic value on a graph like this, but rather a random pattern of peaks and troughs all the way down. That is, my guess is that the question "how many species are there?" is a silly question.

Of course, like all good thought experiments, we could never do this. So this opens a new question - can we approximate the graph using real-world data sets? One possibility may be to use environmental genomic data - this has the advantage that it's presumably sampling sequences from every individual in the genomic soup, with no inherent taxonomic bias or pre-assumed taxonomy. I have no idea whether this idea has merit, and would be pleased to hear from someone who actually knows what they're talking about in this space.

One final question - do we try to deal with this issue in the decadal plan? We need to be careful about admitting that we have absolutely no idea how many species are in Australia and any estimate could be out by many orders of magnitude (this is not a great starting point for asking for funding to document our biodiversity). But we could argue for a project that addresses the meta-question, if indeed there's a way to address it. Now that would be a world scoop, I reckon.

As always, thoughts and comments very welcome.

11 responses
A posthaven user upvoted this post.
from my FaceBook comment : As an 'older-style' taxonomist, I believe that the question "how many species are there?" is a relevant and meaningful question, if not one that is easy to answer. With the avowed intent of stepping on a few toes (metaphorically at least), i post the following thought. Part of the problem with some at least of the recent discussions on taxonomy is the conflation of the two often separate procedures of species delimitation and species relationships. I like the idea that phylogenetic studies, using a range of species characteristics, will inform our study of species relationships. It has long been that way, only our ability to study phylogeny has improved. Whether this is still a quest for the 'rainbow's end' is a topic for another discussion. However, I also firmly believe that species delimitation is a separate exercise from species relationships (i.e. phylogeny). I have seen it argued that phylogenetic studies have transformed taxonomy, and have become supreme. I cannot follow this, as phylogeny comes in once (biologically) meaningful taxa have been recognised. Thus species delimitation and species relationships need to be kept separate, although they may be two parts of a broader study. Perhaps more importantly, any study needs to clearly separate the study of characteristics (information), the study of their relationships (analysis), and the resultant taxonomy. So, yes, there are gaps of all sizes in nature, and which ones separate species needs to be studied in each case; there is no 'gap' which equates to a 'species gap'. Nature is just not that accomodating.
Thanks Trevor. These are very valid points and I completely agree that species delimitation and determination of species relationships are conceptually different (though clearly linked), and often get conflated. I don't think the estimates of global species numbers are using phylogenetic methods, though - if anything, there's a retreat from phylogenetic methods towards phenetic ones when people are addressing these types of questions. And I like your point that nature isn't "that accommodating". My response when people complain that the new understandings of relationships provided by modern phylogenetics leads to inconvenient taxonomies is "so you expect evolution to be convenient?"
Great piece Kevin, in my view taxonomy cannot be measured or have goals based on numbers. I totally agree with you that the total number of species in the world, or even in a well studied group will be whatever number you want it to be. If we know it is unrealistic and non-achievable to set a target of 'getting to list all Australian species' then there is no need to fall into that trap and appear confused having to recur to philosophical arguments on why we want to know the number but we can't. It's great discussion topic but we should keep it out of the decadal plan. Also, due to this common view of seeing taxonomy as numbers is part of the reason why the science of taxonomy is misunderstood and underestimated even by peers in related fields ("we don't care about names" etc). I'd say the goals for the decadal plan should be around increasing knowledge and understanding of the fauna and flora, and ensuring the experts in particular groups --ideally Australia should have at least one expert in every possible animal/plant , say, Order or Family, present in Australia, and are supported so they can produce high quality work and there is continuity. Keeping professional, well trained taxonomists in Australia would ensure not only appropriate knowledge applied to industry needs, but also expanding collaborations and impact to other fields of research. In my view, it's not about numbers, it's about expanding knowledge and documenting it; if this comes together with describing new species, or perhaps lumping species, this is just part of the process of increasing our understanding in that particular group of organisms. I agree with your part on 'numbers don't matter' and agree that the meta-question is more important than the question. While we figure how to answer the meta question we could calculate as many possible estimates as necessary, but knowing that the number doesn't really matter.
Hi Kevin, interesting post. My sense for animals is that there is a distinct gap, which is why barcoding works so effectively for animals (and why BOLD is so successful). Hence one could argue that for animals at least, one could quantify the number of species. Whether such a number is a useful thing to know is another question, but if you are going to ask for funding, surely having some estimate of the scale of the task is going to be required? Perhaps the real danger is that if the goal of taxonomy is to discover and describe all the species, then the task may well be finite (and what happens to taxonomists once they’ve finished?). Shades of Arthur C. Clarke’s Nine Billion Names of God But my bigger concern is whether a “decadal plan” for taxonomy by itself is really the best way forward. Would it not make sense to embed taxonomy in a bigger framework? Knowing what something is and what to call it is not, by itself, terribly useful. Several different countries have tried to increase funding for taxonomy, with limited success. Why is that? Maybe part of the problem is that selling taxonomy as an end in itself is not tenable. How does the decadal plan fit in with other initiatives such as ALA and BHL? Not trying to be negative, just genuinely interested in what is the best way forward to generate support for the study of biodiversity.
Hi Rod - thanks for these comments. I'll start with the first para. I'm interested that your sense is that there is a distinct gap in animals, and this is why BOLD is successful. I suppose I'm keen to explore whether we can do better than having a "sense" of this, and actually ask (and answer) the (meta)question. I'd be keen to get your sense also as to whether BOLD actually addresses this or not. A thought experiment - if BOLD had only one sample per "species" and *assumed* that a gap was there (hence assigned samples to bins based on distance, without actually testing the limits of the bins), then it would seem successful, but only because of an untested assumption. Of course, BOLD has multiple samples per species - so do you think the data in BOLD is fit-for-purpose for answering the meta-question? On your second point - this is very interesting and important. I've often thought about how far we could go with a taxonomy-free biodiversity study. But to answer some of your specific questions: "Would it not make sense to embed taxonomy in a bigger framework?" - I'd say that taxonomy is indeed embedded in a bigger framework - it's foundation to phylogeny, and just about the whole of the rest of biology. (Perhaps I'm misunderstanding your question) "Knowing what something is and what to call it is not, by itself, terribly useful" - hmm, I wonder what the world would be like if we had no names, and no concept of evolutionary entities. Linking back to one of my earlier posts about namespaces, such a world would be like an internet without a DNS system. "Several different countries have tried to increase funding for taxonomy, with limited success. Why is that?" Excellent question, and one we need to address. "How does the decadal plan fit in with other initiatives such as ALA and BHL?" It fits closely (at least with the ALA), as a large part of the Decadal Plan will address where we go next in Australia for making biodiversity information universally available. I'm really keen to know more about the thoughts you hint at in the second para. You're a good blogger - would you be willing to contribute a blog on what you think Australasia should do in the taxonomy space in the next decade? Good on you and cheers - Kevin
Hi Claudia - I agree that taxonomy-as-numbers-game is pretty sterile, and the decadal plan needs to focus on increasing knowledge rather than just adding numbers towards some invisible target. A good analogy is that the astronomers don't say "we've explored x% of the Universe and need to now to do the rest" - they say "we're finding really cool stuff and would like to find more".
There's a big literature on delimiting species using barcodes, e.g. Čandek, K., & Kuntner, M. (2014). DNA barcoding gap: reliable species identification over morphological and geographical scales. Molecular Ecology Resources, 15(2), 268–277. Meyer, C. P., & Paulay, G. (2005). DNA Barcoding: Error Rates Based on Comprehensive Sampling. PLoS Biology, 3(12), e422. The "barcode gap" (difference between intra- and interspecific divergence) is a simplistic notion, but effective in recognising species, and can be made more so using more sophisticated methods (e.g., coalescent-based tools). Indeed, the whole barcoding enterprise would have failed if it wasn't possible to delimit species with reasonable accuracy based on distinguishing between intra- and interspecific divergence (we could invert that result and ask what is it about animal population genetics that makes barcoding so effective). Of course, there are errors, but the fact that these can be quantified is a big step forward. So, the question isn't so much "does it work", but what is its accuracy and precision? Regarding names not being useful, I guess I was trying to say that what we really want to know is what something is and what it does. Many biodiversity databases don't answer that question (EOL's "Traitbase" is one attempt to do this on a global scale). By "bigger framework" I was thinking more about fundable projects. To say that taxonomy is basic to all biology isn't very helpful, other disciplines could no doubt claim the same thing, and saying that taxonomy is vital doesn't seem to have helped funding much over the last few decades. I guess I'm asking whether funding taxonomy as a separate thing makes sense. Is it sustainable in that form? Is it better to think of it as infrastructure? Is it better to identify what things are considered vital and fundable and embed taxonomy within that? For example, Google Maps invests a lot of money in basic mapping of the physical world. By itself I doubt this is commercially viable, but as part of delivering more accurate maps to support navigation, search, and ultimately driverless cars, it is clearly vital. But Google isn't funding making maps, they're funding the ability to be able to tell users where they are, what they can do there, and how to get somewhere else, as well as building a massive database of human spatial and temporal activity. I'm suggesting that maybe a way forward is to identify what are the actual things that are vital and fundable, how taxonomy relates to those, then make the case that if you want these things then taxonomy is part of what you need to fund. Put another way, should you be thinking at the level of a discipline (taxonomy), at the level of institutions (museums and herbaria as "macroscopes"?), thematically (taxonomy as a "biological search engine"?), or nationally (country-level biodiversity monitoring?). I don't have any answers, but I think the discussion needs to be about a lot more than just taxonomy.
Hi Rod - thanks for the response. I realised (after my response to your original second para) that I perhaps wasn't clear enough as to the scope of the decadal plan. It includes not only taxonomy per se (species discovery, naming and identification) but also systematics (phylogenetics), biodiversity informatics (the ALA etc.) etc. The scope is quite broad. I agree completely that we need to think of all these things as infrastructure - one way of looking at the current problem is that we're drawing down on past investment in this infrastructure rather than building it with new investment. I agree also that we need to consider what we offer as a service rather than as a thing-in-itself (the Google Maps analogy). The trick here is to balance the value of the services we provide (for biosecurity, conservation, biodiversity monitoring etc.) with the value of the science we provide *as science*. Both are equally important. And as to your last para above - the answer is yes to all of the above.
I find the question an interesting one, i.e. do we really have to know how many species are on Earth? Maybe not at that scale (although the Earth is only the sum of all its continents, and those are only the some of their countries, and for many countries we have now very idea of described species in many taxa, just visit Europe...). For me leaning out of the window and estimating species (and we have done that in a recent book on Australian spiders, by, again, atomising the group into estimates of numbers in their families an genera etc.) is simply a practical one. If you know what you are aiming for, it might be easier to source some money for it, i.e. via grant applications or fundraising in whatever way. But I completely agree, if you put a system in place that simply does document our biota, i.e. an online registration system for species based on images or barcodes/molecules, or both), it is irrelevant what you are aiming for, it will just happen by adding to it.
1 visitor upvoted this post.