Question

how to map greengenes taxonomy locally

0

Entering edit mode

10.9 years ago

susan.klein ▴ 80

Hi,

I think I'm missing something simple.. I'm trying to use the Greengenes fasta db to match and get taxonomy for a small number of reads. I'm doing this locally with my own scripts. But I cannot find anywhere how to map the greengenes IDs to the taxonomy. There mus be a file somewhere?

Thanks,
Theo

taxonomy metagenomics greengenes • 5.2k views

ADD COMMENT • link updated 3.5 years ago by Ram 45k • written 10.9 years ago by susan.klein ▴ 80

0

Entering edit mode

Hi,

sorry, I thought this was a question: "But I cannot find anywhere how to map the greengenes IDs to the taxonomy. There must be a file somewhere?". I'll rephrase it. Where can I find the greengenes taxonomy file that is appropriate to use to retrieve a taxonomy for each Greengenes ID number?

Thanks.

ADD REPLY • link 10.8 years ago by susan.klein ▴ 80

1

Entering edit mode

When you unzip the gg release there's a dir called taxonomy. There you have files corresponding to different files from the fasta dir..

ADD REPLY • link 10.8 years ago by 5heikki 11k

0

Entering edit mode

ok,

I downloaded from here:

http://greengenes.secondgenome.com/downloads/database/13_5

..and now I see the 'taxonomy' file at the bottom!!

Thanks

ADD REPLY • link updated 3.5 years ago by Ram 45k • written 10.8 years ago by susan.klein ▴ 80

Ram · Answer 1 · 2014-08-01

I'm having a hard time getting an idea of what you are doing and where you are getting stuck -- it would help us more in the future if you gave us a little more detail about what you have tried and where your errors are.

First, what is your script and what is it trying to parse? We have no idea what type of error you are getting and why your match is not providing you the taxonomy if you do not provide us with your "own scripts".

You'll need both the reference alignment file and the reference taxonomy file -- do you have both files?

Also, keep in mind that as of the most recent greengenes release (August 2013) only about 10% of the sequences have names to the species level -- are you trying to parse at the species level?