Question

Reference Sequence For Illumina Metagenomics, Kegg Annotation

0

Entering edit mode

12.5 years ago

marc.noguera.julian • 0

Hi all, I am trying to enter into the metagenomics analysis. I already collecting some samples for gut metagenomics sequencing using illumina technology (and also 454). I have worked my way through mothur and QIIME for taxonomic composition analysis using 454 data, and also metaphlan for illumina data. For the illumina data and the functional annotation and analysis of the sequence data I am finding a bunch of different software and publications. As far as I understand, the first step is to obtain a reference sequence dataset which will balancedly represent bacteria that is expected to be in the gut. This reference has then to be linked to orthologous group in the function database in order to extract representation data on functional modules or categories. How can I obtain such a reference sequence dataset which is then linkeable to functional category? How should I proceed to stablish such a linkage? My apologies if my questions are already documented. I have not been able to find comprehensive information.

Thanks marc

metagenomics reference annotation • 3.4k views

ADD COMMENT • link updated 12.5 years ago by Josh Herr 5.8k • written 12.5 years ago by marc.noguera.julian • 0

score 0 · Answer 1 · 2013-02-28

There's tons of information here on metagenomics. You're in a pretty lucky situation because gut metagenomes have been well characterized (relatively, compared to all other microbiomes) and there is (again relatively) little diversity in most gut samples to deal with.

I don't quite know what you mean about a reference dataset. You should probably be sequencing a control and experimental treatment, is this what you did?

Perhaps you mean a reference database: You compare your data to a reference sequence database to identify your reads. This database will vary depending on the type of gut sample you are investigating (you didn't tell us: human, cow, termite, chicken, etc.?). The type of reference database largely depends on the type of sequencing you have done, not on the platform. You did not tell us this: did you sequence whole genome shotgun or use a marker for amplicon sequencing? There are specific databases for taxonomic identification of gut microbes using 16S and 18S rRNA sequences, as well as numerous databases specialized for whole genome shotgun data (taxonomic and functional, although these are generally not as taxonomically informative as rRNA databases and these databases may be specialized for gut microbiomes and/or general microbes). The M5NR database from Argonne labs is quite good for both taxonomy and functional reads and is well represented with gut microbes.

Let me know if you have any questions on the analysis; I am here to help. Good luck!