HI, I have a gut metagenomics sample (WGS from illumina 2x150bp). The following custom pipeline has been applied:
1 - Filtered my reads (to remove human contaminants and phiX) 2- Assembly with Megahit to get contigs 3 - Binning megahit contigs with metabat 4- Gene prediction on contigs with prodigal (got genes and proteins) 5- Assigned taxonomy to the (bins or genes) with Kaiju
The thing is how can i get the relative abundance for the species present in the sample. Should i map each of my genes back to my reads and simply count the mapped reads. Then divide the number of mapped reads by the total number of reads to get the relative abundance. For example if i get 100000 reads mapped to one gene and my sample has 1M reads than i can assume that the relative abundance of that species is 10% ?? Am i correct or how would you get the relative abundance ?
Thanks for your comments.
Please use
ADD COMMENT/ADD REPLY
to keep threads logically organized when responding to existing posts. This belongs up against @Brian's post.It may be best to put the image up (at postimage.org or other free image providers). Clicking on unknown dropbox links is an inherent risk.
Hi David,
Unfortunately I'm not completely clear on what you are asking. Can you clarify? Nothing should necessarily add up to 100...
And I'm not sure what you mean by "one gene reference file mapps to two different species". You're mapping reads to genes, not genes to species. But certainly, it is possible for the same gene to occur in two different species...
Sorry for not being clear.
The idea at the end is to obtain an OTU table from the metagenomics sample. Say your sample contain 10 species. If you follow the pipeline you end up with a list of genes (or bins) corresponding to each species. I want to know the relative abundance of each of the species.
Programs like kraken or kaiju do it directly from the raw reads but i wanted to do it from the final bins (or genes predicted for each bin). Do that makes sense ?
The problem is that OTU table is normally for 16S but not sure how it works for WGS to establish such table witha bin for instance ? (In my case my sample contains 15 bins, although there are only 10 species).
thanks,
Did you ever figure this out David? I am actually trying to do the same thing.
What i did was to map reads back to each of the bins so you get the bin coverage. Assuming your bin corresponds to one genome you get an approximate number of copies.