Restricting ncbi nr database: from accession numbers to database. Problem with blastdbcmd: strange fasta headers and incomplete output
1
0
Entering edit mode
8.1 years ago

Hi everyone,

I want to make a blast database of insect proteins to locally blast my transcriptome assembly. I dowloaded all the accession numbers associated with insects from the ncbi website. Next, I used this command to retrieve the associated fasta files from my locally installed nr ncbi database.

blastdbcmd -db /home/db/ncbi/nr -entry_batch protein_result.txt -out insects_seq.fa

This however gives me incomplete output - a lot of accession numbers were not found: e.g. Error: CAB42201.1: OID not found

Moreover, I get a lot of multi headers entries in the output file: e.g.

>gi|1080121958|gb|AOW70003.1| arginine kinase, partial [Remella rita] >gi|1080122062|gb|AOW70055.1| arginine kinase, partial [Xenophanes tryxus]
EEKVSSTLSGLEGELKGTFYPLTGMSKQTQQQLIDDHFLFKEGDRFLQAANACRFWPTGRGIYHNENKTFLVWCNEEDHL
RLISMQMGGDLKTVYKRLVTAVNDIEKRIPFSHNDRLGFLTFCPTNLGTTVRASVHIKLPKLAADKAKLEEVASKYHLQV
RGTRGEHTEAEGGVYDISNKRRMGLTEYDAVKEMYDG

Is there a way to avoid both issues?

Thanks a lot in advance! Janne

blast nr accession number blastdbcmd ncbi • 2.7k views
ADD COMMENT
0
Entering edit mode

I can reproduce the second example posted above (with blast+, v.2.5.0) and can recover the same sequence entry using either of those accession numbers independently with blastdbcmd.

Edit: Examining those two individual entries (at NCBI) confirms that the sequences for those are identical. So NCBI is perhaps saving space by including both headers and a single copy of the sequence? That seems to be only logical explanation.

Edit 2: Having two headers like that in a single entry is going to further mess up FASTA format.

You may want to confirm by emailing BLAST support.

ADD REPLY
0
Entering edit mode

Hi Janne,

Have you solved this issue?

ADD REPLY
0
Entering edit mode
8.0 years ago
blanca ▴ 10

It seems to be solved in this other post: [solved] Retrieve fasta from balst db using blastdbcmd: Error: gi|742519789: OID not found

ADD COMMENT

Login before adding your answer.

Traffic: 2331 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6