[Fastacmd] How To Retrieve Sequence From Blast Db
1
4
Entering edit mode
13.1 years ago
Leszek 4.2k

Let's say I have multifasta with protein sequences having internal IDs (integer)

>1234
MGKL...*

I build blast db using:

formatdb -i infile.fa -pF -n someDB

But then, I'm unable to retrieve sequence from db using simple protein id:

fastacmd -d someDB -s 1234

How to define fasta header so I can retrieve sequences easily?
I have noticed formatdb assign internal identifiers (increment int) to my sequences, and orginal ID appears later:

>gnl|BL_ORD_ID|12 1234

Why is that?

I then defined headers as:

>gnl|dbname|1234

but with no effect. Do I have to define headers as >gi|1234 in order to be able to get sequence? Or is there any other way of retrieving sequences from blast db?

blast fasta • 7.3k views
ADD COMMENT
5
Entering edit mode
13.1 years ago
Naga ▴ 450

With formatdb command use "-o" option to create indexes from the protein ID (IDs should be unique to create indexes)

[?] -o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. Do not create indexes [T/F] Optional default = F [?][?] formatdb -i infile.fa -pF -n someDB -o T [?]

and then you can use "-s" option in fastacmd to retrieve single sequence.

ADD COMMENT
0
Entering edit mode

och, stupid me! I didn't noticed that parameter. thanks a lot Nagarajan

ADD REPLY

Login before adding your answer.

Traffic: 1627 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6