Question

Restricting ncbi nr database: from accession numbers to database. Problem with blastdbcmd: strange fasta headers and incomplete output

0

Entering edit mode

8.7 years ago

Janne.Swaegers • 0

Hi everyone,

I want to make a blast database of insect proteins to locally blast my transcriptome assembly. I dowloaded all the accession numbers associated with insects from the ncbi website. Next, I used this command to retrieve the associated fasta files from my locally installed nr ncbi database.

blastdbcmd -db /home/db/ncbi/nr -entry_batch protein_result.txt -out insects_seq.fa

This however gives me incomplete output - a lot of accession numbers were not found: e.g. Error: CAB42201.1: OID not found

Moreover, I get a lot of multi headers entries in the output file: e.g.

>gi|1080121958|gb|AOW70003.1| arginine kinase, partial [Remella rita] >gi|1080122062|gb|AOW70055.1| arginine kinase, partial [Xenophanes tryxus]
EEKVSSTLSGLEGELKGTFYPLTGMSKQTQQQLIDDHFLFKEGDRFLQAANACRFWPTGRGIYHNENKTFLVWCNEEDHL
RLISMQMGGDLKTVYKRLVTAVNDIEKRIPFSHNDRLGFLTFCPTNLGTTVRASVHIKLPKLAADKAKLEEVASKYHLQV
RGTRGEHTEAEGGVYDISNKRRMGLTEYDAVKEMYDG

Is there a way to avoid both issues?

Thanks a lot in advance! Janne

blast nr accession number blastdbcmd ncbi • 3.0k views

ADD COMMENT • link updated 8.7 years ago by blanca ▴ 10 • written 8.7 years ago by Janne.Swaegers • 0

0

Entering edit mode

I can reproduce the second example posted above (with blast+, v.2.5.0) and can recover the same sequence entry using either of those accession numbers independently with blastdbcmd.

Edit: Examining those two individual entries (at NCBI) confirms that the sequences for those are identical. So NCBI is perhaps saving space by including both headers and a single copy of the sequence? That seems to be only logical explanation.

Edit 2: Having two headers like that in a single entry is going to further mess up FASTA format.

You may want to confirm by emailing BLAST support.

ADD REPLY • link 8.7 years ago by GenoMax 152k

0

Entering edit mode

Hi Janne,

Have you solved this issue?

ADD REPLY • link 8.7 years ago by blanca ▴ 10

score 0 · Answer 1 · 2016-11-17

0

Entering edit mode

8.7 years ago

blanca ▴ 10

It seems to be solved in this other post: [solved] Retrieve fasta from balst db using blastdbcmd: Error: gi|742519789: OID not found

ADD COMMENT • link 8.7 years ago by blanca ▴ 10