I have got pair-end metagenomic data (2 x 100bp) generated by Illumina Hiseq 2000 of several samples. Human sequences have been removed and I aligned the rest reads to bacteria reference genomes. I want to calculate organism content (namely, relative abundance) of each sample so that I can use this information in further comparative metagenomics analysis.
The question is how to calculate organism relative abundance from the rest reads. Note that different samples may have different sequencing levels, normalization may be needed. I need organism relative abundance that could correct for different sequencing levels in these samples. Is there a software which can handle this?
p.s.:Qin's paper entitled 'A human gut microbial gene catalogue established by metagenomic sequencing' talks about how relative abundance was calculted, however, there is no much detail provided. It's quite confusing.
As I am building my own analysis pipline, online resources might not fit for me.
You can try MEGAN or MG-RAST.
Thanks for your reply.
I see, maybe you can use QIIME or Kraken.
Thanks, I will have a try.
Both Kraken and Metaphlan seem good. I will try them out.
Metaphlan seems only cover Bacteria and Archaea, which is a limiting factor for its use in Viral and Fungal metagenome.
Kraken is likely to be a resource-intensive software, but it's quite fast.
Hi,
I am also doing same kind of analysis. What do you mean by "different samples may have different sequencing levels"?
Thanks,
D