Shotgun rarefactions - metagenomics (microbiome), MetaPhlAn2
1
2
Entering edit mode
7.6 years ago

Hi there community!

For some time I was working os 16S rRNA gene survey data. For this type of analysis one could use a rarefaction approach in order to have the same depth for each sample. Having different depths for each sample is sometimes referred to as searching 1 square meter of amazon jungle and 1 square kilometer of mojave desert and then comparing OTUs, taxons, etc... It is relatively easy to employ a rarefaction, as it is implemented in many software packages: qiime, mothur.

I have now a shotgun dataset - a whole genome sequencing of microbiome. For the start I am using a microbiome helper SOP. For taxonomy assignement I use MetaPhlAn2 approach. MetaPhlAn2 wiki doesn't even mention rarefaction. Since this step might be crucial for comparative analyses, where I have two groups/categories, each containing around 30 samples I want to have each sample as "standardized" as possible. Are there any approaches two rarefy WGS data? Is there a reason why I has not been yet implemented in for example MetaPhlAn2?

I'd be grateful for any insight, comments and suggestions.

metagenomics whole genome sequencing shotgun • 6.2k views
ADD COMMENT
1
Entering edit mode

Hi, Did you find any solution to this problem? Any suggestion on how to compute diversity with followed by metaphlan2?

ADD REPLY
0
Entering edit mode

Thanks for this post robert.kwapich, this is a critical step if u wanna compare groups of samples that have been shotgun-metagenome sequenced! My intitial instinct was to rarefy based on single copy housekeeping bacterial genes or the ykaryotic contamination but i dont wanna reinvent the wheel if there is already a method available! Cheers!

ADD REPLY
3
Entering edit mode
5.2 years ago

I followed some methods from the paper: "Unexplored diversity and strain-level structure of the skin microbiome associated with psoriasis", https://www.nature.com/articles/s41522-017-0022-5.pdf.

I remember also checking "Nonpareil" software to estimate the saturation/redundancy of my samples, and each was reaching a nice high percentage for all samples, but one or two, that were discarded.

See Nonpareil: http://enve-omics.ce.gatech.edu/nonpareil/

What I did later was to convert relative abundances (i.e. percentages) to pseudo-counts, i.e. multiply percentages by the number of reads per sample.

This would produce microbial profiles that have different number of observations (i.e. counts) reflecting different sequencing depths. For taxonomy abundance analysis you could then use edgeR implementation of GLMs (see . This method can account of different number of observations.

For alpha and beta diversity I normalized the counts/observations to the same total number of observations, like the maximum. Since all my samples had comparable number of sequences and reached comparable saturation, perhaps this wouldn't introduce many errors.

Nevertheless, the nature paper above uses unique species count for each sample as a measure of richness, and for this, if you have reached similar and high saturation of each sample, we'd not expect much difference. But evenness with Inversed Simpson for example needs to use this normalized pseudo-counts stratified at some level, ex. species.

But it has been some time, and many papers published since then that I didn't follow. So, that is it. If you find out something better, please let me know.

ADD COMMENT
0
Entering edit mode

Thanks! I'll update... I'm really stuck with rarefying metaphlan2. Well I can subsample reads but it will take forever to subsample different cuttoffs but with repeats it will take forever to re-run metaphlan2. As for the diveristy- observed OTU (species) and shannon can be calculated easily with relative abundance. Also Jaccard distance, so I believe I'll start with them... Did you used metaphlan2 to get the relative abundance? The problem with pseudocounts is that the number of reads is different (the main reason I want to rarefy). Thanks again for your detailed answer

ADD REPLY
0
Entering edit mode

Hi Robert I wanted to rarefy my metagenomic datasets before profiling them for the taxonomy using metaphlan3. I am not able to find any methods. Also can you please explain why is it a good idea to rarefy the datasets before doing the taxonomic assignments using Metaphlan3? How would it affect the results? I have datasets that range from 1GB to 39 GB size and in the terms of DNA yield range from 1.2 Mbasepairs to 10.3 MBasepairs.

Thanks in Advance Saraswati

ADD REPLY

Login before adding your answer.

Traffic: 1609 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6