Question

Formatdb From A Batch Of Fasta Sequences

3

Entering edit mode

14.3 years ago

Bdv ▴ 320

Hi I have a batch of fasta files, each one is a file of ESTs from a different plant. I want to create a database for all the plants, how can I use formatdb (or another way) to create one database which I could blast against? Thank you!!

est blast fasta database • 11k views

ADD COMMENT • link updated 14.3 years ago by Alex ★ 1.5k • written 14.3 years ago by Bdv ▴ 320

Ram · Answer 1 · 2011-01-18

5

Entering edit mode

14.3 years ago

Pierre Lindenbaum 166k

formatdb accepts stdin for the argument -i . See the manual.

This parameter is mandatory. It requires the full file name with extension. The input file should have sequences in FASTA or ASN.1 format, except when converting a gi list to binary form. To format multiple input files, quote the input file names as in -i "db1 db2". The FASTA output from other programs can be pipe to this option using "-i stdin". Renaming of database is recommended (mandatory in the first case).

So, using unix find, you can do something like this:

find dir1/ dir2 dir4/subdir/ -name "*.fasta" -exec cat '{}' ';' |\
    fastacmd -i stdin

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 14.3 years ago by Pierre Lindenbaum 166k

0

Entering edit mode

very nice command-foo

ADD REPLY • link 14.3 years ago by Will 4.6k

Ram · Answer 2 · 2011-01-18

3

Entering edit mode

14.3 years ago

Daniel ★ 4.0k

formatdb is simple enough to call:

formatdb -i *your_input_file* -p F

(for ESTs you'll want to add -p F to tell it its a nucleotide sequence, not protein)

I always found it easier to put all the ESTs into on fasta file using cat beforehand too, just for generally working.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 14.3 years ago by Daniel ★ 4.0k

Ram · Answer 3 · 2011-01-19

With current NCBI's Blast you can make something like this:

Compute separate blast db for each fasta file:

makeblastdb -in mus_ref37_chr1.fa mus_ref37_chr1 -title "Mouse chromosomes 1, Ref B37.1"
makeblastdb -in mus_ref37_chr2.fa mus_ref37_chr2 -title "Mouse chromosomes 2, Ref B37.1"
makeblastdb -in mus_ref37_chr3.fa mus_ref37_chr3 -title "Mouse chromosomes 3, Ref B37.1"

Then join them in one DB: blastdb_aliastool -dblist "mus_ref37_chr1 mus_ref37_chr2 mus_ref37_chr3" -dbtype nucl -out mus_genomes_three_chrs -title "Mouse chromosomes 1-3"

More here (descriptions pp. 12-14 and usage example pp. 26-28)

With this approach you can search either all fasta files or specific one.

Ram · Answer 4 · 2011-01-19

1

Entering edit mode

14.3 years ago

Anuraj Nayarisseri ▴ 750

Make all your fasta sequence in a single text file. remember to use single gap between all sequences.Then give a file name in extension .txt. suppose your input file name is abc.txt which contain all the ESTs. then use the following command for formatting.

formatdb -i abc.txt -p F -o T

Hope this will help you.

ADD COMMENT • link updated 5.7 years ago by Ram 45k • written 14.3 years ago by Anuraj Nayarisseri ▴ 750

0

Entering edit mode

Yes, I recommend this way too.

ADD REPLY • link 14.3 years ago by Dejian ★ 1.3k