Question

metagenomics- read based diversity analysis

0

Entering edit mode

5.9 years ago

biobiu ▴ 150

I'm interested in alpha and beta diversity analysis based on read based and marker gene methods. Is there any recommendations on tools or pipeline that: 1) Calculates abundance 2) Rarefaction 3) Calculates diversity

*I'm familiar with several tools that calculates relative abundance but moving from relative to absolute abundance for the diversity analysis is not straightforward...

metagenomics • 5.0k views

ADD COMMENT • link updated 5.8 years ago by antonioggsousa 3.4k • written 5.9 years ago by biobiu ▴ 150

score 2 · Answer 1 · 2019-10-31

2

Entering edit mode

5.8 years ago

antonioggsousa 3.4k

Hi @biobiu,

Yes, the phyloseq R package performs alpha- and beta-diversity metrics based on absolute read-abundance such as Shannon diversity (alpha) and Bray-Curtis dissimilarity (beta). There is also another R package that is useful to determine rarefaction curves and perform several alpha- and beta-diversity analyses, that is vegan (in this page you can found the link). But several amplicon marker gene pipelines have their own methods implemented, such as mothur, UPARSE and QIIME2 as well several statistical methods implemented. The advantage of using these is that you can perform upstream, ie, processing your sequence reads until get an OTU/ASV table, and downstream, ie, alpha- and beta-diversity, data analyses.

My general recommendation would be: (1) for beta-diversity: transform your data (other than rarefying at even sampling depth), z-scores, clr and so on; (2) use phylogenetic metrics instead traditional macro-ecology metrics, such as Shannon and Bray-Curtis, such as Phylogenetic Diversity (for alpha) and Unifrac; (3) keep in mind that sequencing data is compositional, ie, you can't measure the real number of molecules, only their relative contribution, and, therefore, it is useless using Shannon, Bray-Curtis and so on, ie, metrics that rely on absolute abundances (please read some papers about: https://www.frontiersin.org/articles/10.3389/fmicb.2017.02224/full ).

I hope this helps.

Sincerely,

António

ADD COMMENT • link 5.8 years ago by antonioggsousa 3.4k

0

Entering edit mode

Hi @antonioggsousa,

I am using estimateR function in the vegan package in R for estimating various chao diversity estimatess. However it returns 0 for all samples in se.chao1 and NaN S.ace and se.ace. Also the S.obs and S.chao1 columns have same values.

I found it weird getting 0 and NaN, would you be able to help me with this? Is this right or an error?

Thank you in advance

ADD REPLY • link 4.4 years ago by pdhrati02 ▴ 30

0

Entering edit mode

Hi @pdhrati02,

Are you trying running that function with absolute counts? (I'm not familiar with that function)

If you're running the function with absolute counts, do you have singletons and doubletons (OTUs/ASVs that appear once or twice in your data set)?

António

ADD REPLY • link 4.4 years ago by antonioggsousa 3.4k

0

Entering edit mode

Hi @antonioggsousa, Yes I am using absolute counts. This is the metaphlan2 output table which I converted to absolute count.

I am not sure what you mean by singletons and doubletons, but this table is at genera level and all the genus names are present only once.

If not this function which other function would you suggest to calculate chao indices? (Apart from using the phyloseq package).

Thank you

ADD REPLY • link 4.4 years ago by pdhrati02 ▴ 30

0

Entering edit mode

Well, I'm not sure if having a table of counts aggregated at genus level is recommended to estimate the Chao metric. I don't think so, because this index relies on species abundance. Though it is common in microbial ecology the use of taxonomic units (OTUs/ASVs) as an approximation for species, and that's why people use it. Although estimating the true richness or alpha-diversity in a sample is very hard, and there are many researchers suggesting that this is not possible or adequate to do it from next-generation sequencing data.

By singletons or doubletons I mean a taxonomic unit that appears only once or twice in your data set. S̶o̶, i̶n̶ y̶o̶u̶r̶ c̶a̶s̶e̶, c̶o̶n̶s̶i̶d̶e̶r̶i̶n̶g̶ "t̶a̶x̶o̶n̶o̶m̶i̶c̶ u̶n̶i̶t̶s̶" a̶s̶ g̶e̶n̶e̶r̶a̶, i̶t̶ w̶o̶u̶l̶d̶ b̶e̶ a̶n̶y̶ g̶e̶n̶e̶r̶a̶ t̶h̶a̶t̶ a̶p̶p̶e̶a̶r̶s̶ o̶n̶l̶y̶ a̶c̶r̶o̶s̶s̶ o̶n̶e̶ s̶a̶m̶p̶l̶e̶ (̶=̶ s̶i̶n̶g̶l̶e̶t̶o̶n̶)̶ o̶r̶ t̶w̶o̶ s̶a̶m̶p̶l̶e̶s̶ (̶=̶ d̶o̶u̶b̶l̶e̶t̶o̶n̶)̶. S̶o̶, l̶e̶t̶'s̶ s̶a̶y̶ t̶h̶a̶t̶ E̶s̶c̶h̶e̶r̶i̶c̶h̶i̶a̶ i̶s̶ a̶ s̶i̶n̶g̶l̶e̶t̶o̶n̶ i̶n̶ y̶o̶u̶r̶ d̶a̶t̶a̶ s̶e̶t̶. I̶t̶ m̶e̶a̶n̶s̶ t̶h̶a̶t̶ o̶n̶l̶y̶ a̶p̶p̶e̶a̶r̶s̶/w̶a̶s̶ d̶e̶t̶e̶c̶t̶e̶d̶ i̶n̶ o̶n̶e̶ s̶a̶m̶p̶l̶e̶ a̶c̶r̶o̶s̶s̶ 6̶ s̶a̶m̶p̶l̶e̶s̶ t̶h̶a̶t̶ y̶o̶u̶ m̶i̶g̶h̶t̶ h̶a̶v̶e̶ (̶l̶e̶t̶'s̶ s̶a̶y̶ t̶h̶a̶t̶ y̶o̶u̶'v̶e̶ 6̶ s̶a̶m̶p̶l̶e̶s̶)̶. S̶o̶, i̶n̶ t̶h̶e̶ s̶a̶m̶p̶l̶e̶ t̶h̶a̶t̶ i̶t̶ a̶p̶p̶e̶a̶r̶s̶ h̶a̶s̶ a̶ v̶a̶l̶u̶e̶ o̶f̶ a̶b̶u̶n̶d̶a̶n̶c̶e̶ d̶i̶f̶f̶e̶r̶e̶n̶t̶ t̶h̶a̶n̶ z̶e̶r̶o̶ a̶n̶d̶ i̶n̶ t̶h̶e̶ r̶e̶m̶a̶i̶n̶i̶n̶g̶ o̶n̶e̶s̶ h̶a̶s̶ z̶e̶r̶o̶.

This is very important because Chao metric relies on singletons and doubletons to estimate the true richness in a sample.

Check the following mothur wiki page that explains the index: https://mothur.org/wiki/chao/

I hope this helps,

António

ADD REPLY • link 4.4 years ago by antonioggsousa 3.4k

0

Entering edit mode

Thank you Very much, I will make sure to use species level data. Also I will have a look at the link and cheers for the explanation. Its great help.

ADD REPLY • link 4.4 years ago by pdhrati02 ▴ 30

0

Entering edit mode

You're welcome.

Let me clarify something, because actually what I said above is not correct. So, a singleton is a taxonomic unit that appears only once, i.e., only one read/sequence in a sample. A doubleton is a taxonomic unit that appears twice, i.e., only two reads/sequences in a sample. You can say that you have singletons per sample or per data set. A singleton in a sample is a taxonomic unit that has only one read/sequence as absolute abundance in that sample. A singleton in a data set is a taxonomic unit that has only one sequence across all the samples (the same logic for the doubletons).

Sorry by my mistake and confusion.

António

(I'll edit it above)

ADD REPLY • link 4.4 years ago by antonioggsousa 3.4k

score 1 · Answer 2 · 2019-10-31

1

Entering edit mode

5.8 years ago

DriesB ▴ 110

I expect that phyloseq will be able to analyze rarefaction and calculate diversity, based on literature studies, its website and its Bioconductor workflow. I'm still investigating how to apply this framework myself, however; I've asked this question about it.

ADD COMMENT • link 5.8 years ago by DriesB ▴ 110