I think I'm missing something simple.. I'm trying to use the Greengenes fasta db to match and get taxonomy for a small number of reads. I'm doing this locally with my own scripts. But I cannot find anywhere how to map the greengenes IDs to the taxonomy. There mus be a file somewhere?
sorry, I thought this was a question: "But I cannot find anywhere how to map the greengenes IDs to the taxonomy. There must be a file somewhere?". I'll rephrase it. Where can I find the greengenes taxonomy file that is appropriate to use to retrieve a taxonomy for each Greengenes ID number?
I'm having a hard time getting an idea of what you are doing and where you are getting stuck -- it would help us more in the future if you gave us a little more detail about what you have tried and where your errors are.
First, what is your script and what is it trying to parse? We have no idea what type of error you are getting and why your match is not providing you the taxonomy if you do not provide us with your "own scripts".
You'll need both the reference alignment file and the reference taxonomy file -- do you have both files?
Also, keep in mind that as of the most recent greengenes release (August 2013) only about 10% of the sequences have names to the species level -- are you trying to parse at the species level?
Hi,
sorry, I thought this was a question: "But I cannot find anywhere how to map the greengenes IDs to the taxonomy. There must be a file somewhere?". I'll rephrase it. Where can I find the greengenes taxonomy file that is appropriate to use to retrieve a taxonomy for each Greengenes ID number?
Thanks.
When you unzip the gg release there's a dir called taxonomy. There you have files corresponding to different files from the fasta dir..
ok,
I downloaded from here:
http://greengenes.secondgenome.com/downloads/database/13_5
..and now I see the 'taxonomy' file at the bottom!!
Thanks