Hi I have a batch of fasta files, each one is a file of ESTs from a different plant. I want to create a database for all the plants, how can I use formatdb (or another way) to create one database which I could blast against? Thank you!!
Hi I have a batch of fasta files, each one is a file of ESTs from a different plant. I want to create a database for all the plants, how can I use formatdb (or another way) to create one database which I could blast against? Thank you!!
formatdb
accepts stdin for the argument -i
. See the manual.
This parameter is mandatory. It requires the full file name with extension. The input file should have sequences in FASTA or ASN.1 format, except when converting a gi list to binary form. To format multiple input files, quote the input file names as in -i "db1 db2". The FASTA output from other programs can be pipe to this option using "-i stdin". Renaming of database is recommended (mandatory in the first case).
So, using unix find
, you can do something like this:
find dir1/ dir2 dir4/subdir/ -name "*.fasta" -exec cat '{}' ';' |\
fastacmd -i stdin
formatdb
is simple enough to call:
formatdb -i *your_input_file* -p F
(for ESTs you'll want to add -p F
to tell it its a nucleotide sequence, not protein)
I always found it easier to put all the ESTs into on fasta file using cat beforehand too, just for generally working.
With current NCBI's Blast you can make something like this:
Compute separate blast db for each fasta file:
makeblastdb -in mus_ref37_chr1.fa mus_ref37_chr1 -title "Mouse chromosomes 1, Ref B37.1"
makeblastdb -in mus_ref37_chr2.fa mus_ref37_chr2 -title "Mouse chromosomes 2, Ref B37.1"
makeblastdb -in mus_ref37_chr3.fa mus_ref37_chr3 -title "Mouse chromosomes 3, Ref B37.1"
blastdb_aliastool -dblist "mus_ref37_chr1 mus_ref37_chr2 mus_ref37_chr3" -dbtype nucl -out mus_genomes_three_chrs -title "Mouse chromosomes 1-3"
More here (descriptions pp. 12-14 and usage example pp. 26-28)
With this approach you can search either all fasta files or specific one.
Make all your fasta sequence in a single text file. remember to use single gap between all sequences.Then give a file name in extension .txt. suppose your input file name is abc.txt which contain all the ESTs. then use the following command for formatting.
formatdb -i abc.txt -p F -o T
Hope this will help you.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
very nice command-foo