I have a file of BLAST protein results. It is over 100000 lines long. I want to get the protein/gene title associated with each accession number. I have been looking for a file I could search through similar to gene2refseq that would have the protein title mapped to the GI or accession with no luck.
I have tried to figure out how to use Bio::DB::EUtilities to retrieve what I need but I don't know what method to use that will get what I want back.
For example, using protein accession WP_009600323.1 I want to get back tRNA uridine(34) 5-carboxymethylaminomethyl synthesis enzyme MnmG [Vibrio caribbeanicus].
How can I get this information? Thank you.
In case you're blasting against some prebuilt NCBI blast db, you could have:
or
But only suckers read things like manuals or output of:
Right?
Anyway, you don't have to re-blast (assuming prebuilt NCBI blast db):
Let us give @Tawny the benefit of doubt.
As @5heikki said you can use blastdbcmd (a command line utility from blast+ package) to get the info you need.
blastdbcmd -db /path_to/nr -entry WP_009600323.1 -outfmt "%t"
Check
blastdbcmd -help
to investigate-entry_batch
option that would allow you to provide a file with these ID's and get all results back at once.Both @5heikki and @genomax2 are correct. If the -outfmt stitle had been used my problem would have already been solved. BLAST+ can give me what I need. But in this instance I do not have the database that was used to generate this BLAST output. I am running the blastdbcmd using a database that I have that should be close to what was used to generate my data but it is not 100% complete. I am getting a very large number of OID not found errors. What other method can be used to get the protein title?
Did you try nr? If ID's are not there in nr then perhaps it some may have been retired (perhaps duplicates).
With Entrez Direct something like:
Not sure it's the same elements for all accessions. It's easier with gi number:
It's sad how the NCBI is phasing out gi numbers, while so many of their services (including standalone blast) still can't provide the same level of convenience with accessions..
@5heikki thank you for the efetch command. I am just getting back to this problem after working on some other projects. I was able to include the second efetch command you posted into a Perl script and it is working nicely. It is slow but does the job.