Formatdb From A Batch Of Fasta Sequences
4
3
Entering edit mode
13.9 years ago
Bdv ▴ 320

Hi I have a batch of fasta files, each one is a file of ESTs from a different plant. I want to create a database for all the plants, how can I use formatdb (or another way) to create one database which I could blast against? Thank you!!

est blast fasta database • 10k views
ADD COMMENT
5
Entering edit mode
13.9 years ago

formatdb accepts stdin for the argument -i . See the manual.

This parameter is mandatory. It requires the full file name with extension. The input file should have sequences in FASTA or ASN.1 format, except when converting a gi list to binary form. To format multiple input files, quote the input file names as in -i "db1 db2". The FASTA output from other programs can be pipe to this option using "-i stdin". Renaming of database is recommended (mandatory in the first case).

So, using unix find, you can do something like this:

find dir1/ dir2 dir4/subdir/ -name "*.fasta" -exec cat '{}' ';' |\
    fastacmd -i stdin
ADD COMMENT
0
Entering edit mode

very nice command-foo

ADD REPLY
3
Entering edit mode
13.9 years ago
Daniel ★ 4.0k

formatdb is simple enough to call:

formatdb -i *your_input_file* -p F

(for ESTs you'll want to add -p F to tell it its a nucleotide sequence, not protein)

I always found it easier to put all the ESTs into on fasta file using cat beforehand too, just for generally working.

ADD COMMENT
2
Entering edit mode
13.9 years ago
Alex ★ 1.5k

With current NCBI's Blast you can make something like this:

  1. Compute separate blast db for each fasta file:

    makeblastdb -in mus_ref37_chr1.fa mus_ref37_chr1 -title "Mouse chromosomes 1, Ref B37.1"
    makeblastdb -in mus_ref37_chr2.fa mus_ref37_chr2 -title "Mouse chromosomes 2, Ref B37.1"
    makeblastdb -in mus_ref37_chr3.fa mus_ref37_chr3 -title "Mouse chromosomes 3, Ref B37.1"
    
  2. Then join them in one DB: blastdb_aliastool -dblist "mus_ref37_chr1 mus_ref37_chr2 mus_ref37_chr3" -dbtype nucl -out mus_genomes_three_chrs -title "Mouse chromosomes 1-3"

More here (descriptions pp. 12-14 and usage example pp. 26-28)

With this approach you can search either all fasta files or specific one.

ADD COMMENT
1
Entering edit mode
13.9 years ago

Make all your fasta sequence in a single text file. remember to use single gap between all sequences.Then give a file name in extension .txt. suppose your input file name is abc.txt which contain all the ESTs. then use the following command for formatting.

formatdb -i abc.txt -p F -o T

Hope this will help you.

ADD COMMENT
0
Entering edit mode

Yes, I recommend this way too.

ADD REPLY

Login before adding your answer.

Traffic: 1667 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6