Hi, I'm wondering where can i find the OTU IDs used in the Greengenes database? Whether the information can be found in the downloaded files from the greengenes website? Thanks!
Hi, I'm wondering where can i find the OTU IDs used in the Greengenes database? Whether the information can be found in the downloaded files from the greengenes website? Thanks!
Hi,
Yes, you can find the sequence ids of the Greengenes database here (this is the link to the Greengenes database repository version 13.5 - check this link to choose a different version): https://greengenes.secondgenome.com/?prefix=downloads/greengenes_database/gg_13_5/.
If you download the file gg_13_5_otus.tar.gz (after download and decompress it), you'll find several folders:
otus: OTU ids clustered at a specific identity threshold
taxonomy: taxonomy of the OTU ids at a specific identity threshold
For a specific identity threshold check both files. The otus folder contains the correspondence between the OTU id representative sequences of the database and the sequence ids from the whole database (without clustering / with redundancy) that were mapped/clustered to that respective OTU ids.
I hope this helps,
António
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Just commenting that greengenes is dead as it hasn't been updated in nearly a decade. RDP and Silva provide much more up to date 16S reference data sets..