How to calculate organism relative abundance from Illumina metagenomic data?
2
0
Entering edit mode
10.6 years ago
nkuyfq ▴ 70

I have got pair-end metagenomic data (2 x 100bp) generated by Illumina Hiseq 2000 of several samples. Human sequences have been removed and I aligned the rest reads to bacteria reference genomes. I want to calculate organism content (namely, relative abundance) of each sample so that I can use this information in further comparative metagenomics analysis.

The question is how to calculate organism relative abundance from the rest reads. Note that different samples may have different sequencing levels, normalization may be needed. I need organism relative abundance that could correct for different sequencing levels in these samples. Is there a software which can handle this?

p.s.:Qin's paper entitled 'A human gut microbial gene catalogue established by metagenomic sequencing' talks about how relative abundance was calculted, however, there is no much detail provided. It's quite confusing.

As I am building my own analysis pipline, online resources might not fit for me.

metagenome relative abundance normalization • 13k views
ADD COMMENT
1
Entering edit mode

You can try MEGAN or MG-RAST.

ADD REPLY
0
Entering edit mode

Thanks for your reply.

  1. MEGAN seems to give absolute read counts to me, and I don't know whether it has corrected for different sequencing levels in my samples.
  2. As I am building my own analysis pipline, online resources might not fit for me
ADD REPLY
0
Entering edit mode

I see, maybe you can use QIIME or Kraken.

ADD REPLY
0
Entering edit mode

Thanks, I will have a try.

ADD REPLY
0
Entering edit mode

Both Kraken and Metaphlan seem good. I will try them out.

ADD REPLY
0
Entering edit mode

Metaphlan seems only cover Bacteria and Archaea, which is a limiting factor for its use in Viral and Fungal metagenome.

Kraken is likely to be a resource-intensive software, but it's quite fast.

ADD REPLY
0
Entering edit mode

Hi,

I am also doing same kind of analysis. What do you mean by "different samples may have different sequencing levels"?

Thanks,
D

ADD REPLY
2
Entering edit mode
10.6 years ago

Check out Kraken:

http://genomebiology.com/2014/15/3/R46

https://github.com/DerrickWood/kraken

Recently published out of the Salzberg lab. It's easy to run and install, if a bit resource intensive. I would be skeptical of anything mapping below species, but to the genus level it's reasonably accurate. Not to mention blazing fast.

Edit:

Kraken won't deliver immediate relative abundance, but an identification for each reference, which will likely recaputulate the relative abundance taken together.

MetaPhilAn is another tool that is specifically for relative abundance and is worth a look.

http://huttenhower.sph.harvard.edu/metaphlan

ADD COMMENT
0
Entering edit mode

+1 for metaphlan

ADD REPLY
1
Entering edit mode
9.8 years ago
Len Trigg ★ 1.6k

You can also try the metagenomics functions from Real Time Genomics. The RTG species tool takes SAM/BAM mapped to a reference species database and calculates abundances either by abundance of dna or by organism (taking genome lengths into account). If your reference includes taxonomic information (RTG provides pre-built ones that include bacteria and viruses) this is also taken into account, and reporting is both via tsv and krona visualization. (RTG Core is available free for non-commercial academic use).

ADD COMMENT

Login before adding your answer.

Traffic: 1923 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6