Question

Alternatives to EUtils to get Taxa from GenBank

0

Entering edit mode

8.1 years ago

irazoqui.matias ▴ 10

Hello.

I have several files (>200) with more than 100 protein sequences each and I want to get the taxonomy of each sequence. My first thought was using BLAST. Since they are quite a few, I used a PERL scrip to "remote blast" them and then extracted the accession of the best hit. I've tried using EUtils to retrieve the taxonomy, but it's really slow. I've also tried to use Bio::LITE::Taxonomy package, but it only uses GI (not the accession), and NCBI doesn't use it anymore. My other thought was a standalone BLAST, but the nr database is really big and making it takes a lot of time.

Does anyone know a better way to get the taxonomy of a protein sequence or from the accession? If not, I'll stick to EUtils.

Thanks!!

taxonomy ncbi EUtils genbank • 2.1k views

ADD COMMENT • link updated 8.1 years ago by Sej Modha 5.3k • written 8.1 years ago by irazoqui.matias ▴ 10

0

Entering edit mode

You can download pre-formatted nr database files from NCBI. It is still a big download. I am not sure if NCBI actually includes taxonomy info in their pre-formatted blast indexes.

Have you looked at this solution on Stackoverflow?

ADD REPLY • link 8.1 years ago by GenoMax 147k

score 1 · Accepted Answer · 2016-10-28

1

Entering edit mode

8.1 years ago

Sej Modha 5.3k

You can try converting the accession number to taxonomic ID first and then use the taxonomy ID to fetch full taxonomy using the taxdump files.