Hi all,
I'm occasionally encountering the following error when retrieving sequences with blastdbcmd using the -target_only
option:
(base) 16:34:24 marco@blast:~$ blastdbcmd -entry WBM69675.1 -target_only -db nr
Error: [blastdbcmd] Error: oid headers do not contain target gi/seq_id.
Without -target_only
everything works fine:
(base) 16:34:28 marco@blast:~$ blastdbcmd -entry WBM69675.1 -db nr
>WP_183271186.1 MULTISPECIES: lysine decarboxylase CadA [unclassified Buttiauxella] >WBM69675.1 lysine decarboxylase CadA [Buttiauxella sp. WJP83] >GDX05976.1 lysine decarboxylase CadA [Buttiauxella sp. A111]
MNVIAIMNHMGVYFKEEPIRELHRALERLDFRIVYPNDREDLLKLIENNARLCGVIFDWDKYNLELCEEISKCNEYMPLY
AFANTYSTLDVSLNDLRLQVRFFEYALGAAEDIANKIKQNTDEYIDTILPPLTKALFKYVREGKYTFCTPGHMGGTAFQK
SPVGSIFYDFFGSNTMKSDISISVSELGSLLDHSGPHKEAEEYIARVFNAERSYMVTNGTSTANKIVGMYSAPAGSTVLI
DRNCHKSLTHLMMMSNITPIYFRPTRNAYGILGGIPQSEFQRATIAKRVKETPNATWPVHAVITNSTYDGLLYNTDFIKK
TLDVKSIHFDSAWVPYTNFSPIYAGKCGMSGGRVEGKVIYETQSTHKLLAAFSQASMIHVKGDINEETFNEAYMMHTTTS
PHYGVVASTETAAAMMKGNSGKRLIDGSIERSIKFRKEIKRLKGESEGWFFDVWQPEHIDGAECWPLRSDSAWHGFKNID
NEHMYLDPIKVTMLTPGMKKDGTMDEFGIPASIVSKYLDEHGIIVEKTGPYNLLFLFSIGIDKTKALSLLRALTDFKRSF
DLNLRVKNMLPSLYREDPEFYENMRIQELAQNIHKLIAHHNLPDLMFRAFEVLPSMMVTPFVAFQKELHGQTEEVYLDEM
VGRVNANMILPYPPGVPLVMPGEMITEESRPVLEFLQMLCEIGAHYPGFETDIHGAYRQADGRYTVKVLKEENNK
I recently updated my local nr database, I'm wondering whether it's corrupted since I'm pretty sure I've never seen this error before. The most annoying thing for me is that when using -entry_batch
along with -target_only
the program is terminated whenever this error occurs, so the problematic entry is not just skipped and the whole thing dies.
Can someone try to reproduce the problem and let me know if I have a broken nr database or something? Many thanks!
UPDATE! This error occurs with Gene Bank entries starting with W, just like the following ones: WAH52037.1 WBL74272.1 WDB51475.1 WDB43112.1 WCZ02214.1 WCP79122.1 WAG26413.1 WAH53327.1
I reported this problem to NCBI, will keep this post updated.
-target_only
is meant to write just 1 out of many possible headers for the target sequence. For instance, in the example below I successfully use it for a WP* entry. Note that in the first case, with-target_only
I obtain only 1 header.If this is not working on some entries then this may be a question best sent in to NCBI help desk with examples.