Question

Relative abundance fro metagenomics samples

5

Entering edit mode

7.9 years ago

David ▴ 240

HI, I have a gut metagenomics sample (WGS from illumina 2x150bp). The following custom pipeline has been applied:

1 - Filtered my reads (to remove human contaminants and phiX) 2- Assembly with Megahit to get contigs 3 - Binning megahit contigs with metabat 4- Gene prediction on contigs with prodigal (got genes and proteins) 5- Assigned taxonomy to the (bins or genes) with Kaiju

The thing is how can i get the relative abundance for the species present in the sample. Should i map each of my genes back to my reads and simply count the mapped reads. Then divide the number of mapped reads by the total number of reads to get the relative abundance. For example if i get 100000 reads mapped to one gene and my sample has 1M reads than i can assume that the relative abundance of that species is 10% ?? Am i correct or how would you get the relative abundance ?

Thanks for your comments.

metagenomics bwa taxonomy relative abundance • 7.2k views

ADD COMMENT • link updated 5.7 years ago by lagartija ▴ 160 • written 7.9 years ago by David ▴ 240

score 0 · Answer 1 · 2017-01-19

You should map your reads to the assembly. The abundance is the average coverage of a gene, not the number of reads mapping to it. For example, using BBMap:

bbmap.sh in1=r1.fastq in2=r2.fastq ref=genes.fasta out=mapped.sam covstats=covstats.txt

covstats.txt will tell you the average coverage of each gene, which is proportional to the abundance (ignoring bias).

score 0 · Answer 2 · 2017-01-19

0

Entering edit mode

7.9 years ago

David ▴ 240

Thanks Brian, My confusion was coming from number of mapped reads vs coverage.

The output gives a mean Avg_fold of 16.895 (see attached picture summary of the output covstats.txt file). If i do this for all my group of genes (coming from the binning) i will end up with several coverages.

Then if we have mapped 10 different bins coming from the same sample the overall coverage should be 100 ?

Also what happens if one gene reference file mapps to two different species ?

https://dl.dropboxusercontent.com/u/24466146/mapped_reads_to_reference_genes.png

ADD COMMENT • link 7.9 years ago by David ▴ 240

0

Entering edit mode

Please use ADD COMMENT/ADD REPLY to keep threads logically organized when responding to existing posts. This belongs up against @Brian's post.

It may be best to put the image up (at postimage.org or other free image providers). Clicking on unknown dropbox links is an inherent risk.

ADD REPLY • link 7.9 years ago by GenoMax 147k

0

Entering edit mode

Hi David,

Unfortunately I'm not completely clear on what you are asking. Can you clarify? Nothing should necessarily add up to 100...

And I'm not sure what you mean by "one gene reference file mapps to two different species". You're mapping reads to genes, not genes to species. But certainly, it is possible for the same gene to occur in two different species...

ADD REPLY • link 7.9 years ago by Brian Bushnell 20k

0

Entering edit mode

Sorry for not being clear.

The idea at the end is to obtain an OTU table from the metagenomics sample. Say your sample contain 10 species. If you follow the pipeline you end up with a list of genes (or bins) corresponding to each species. I want to know the relative abundance of each of the species.

Programs like kraken or kaiju do it directly from the raw reads but i wanted to do it from the final bins (or genes predicted for each bin). Do that makes sense ?

The problem is that OTU table is normally for 16S but not sure how it works for WGS to establish such table witha bin for instance ? (In my case my sample contains 15 bins, although there are only 10 species).

thanks,

ADD REPLY • link 7.9 years ago by David ▴ 240

0

Entering edit mode

Did you ever figure this out David? I am actually trying to do the same thing.

ADD REPLY • link 5.9 years ago by Longshotx ▴ 70

0

Entering edit mode

What i did was to map reads back to each of the bins so you get the bin coverage. Assuming your bin corresponds to one genome you get an approximate number of copies.

ADD REPLY • link 5.9 years ago by David ▴ 240

score 0 · Answer 3 · 2019-03-04

0

Entering edit mode

5.7 years ago

lagartija ▴ 160

Hi, I'm found this post useful because I have the same task (exept that I prefere to work on contigs than on bins). I also mapped the reads to my contigs to get the mean coverage. Is that a sufficient estimation of the (relative) abundance or do I have more steps to do to reduce biases ?

ADD COMMENT • link 5.7 years ago by lagartija ▴ 160