Question

full sequences from command line tblastn

0

Entering edit mode

8.5 years ago

peachila • 0

Hello,

I am a new user of command line blast. I am using a protein sequence query to search through a DNA database I created with makeblastdb. I am getting appropriate results and all is well but I cannot seem to be able to get a fasta file with the complete sequences of the results.

To make clear, I am wanting one file with information such as the e-value and score in tab format (which I am able to get) and in addition, a fasta file with the complete sequences of the resulted accession numbers. If possible I'd want the translated sequence, in amino acids and not DNA.

my command looks like this: tblastn -query query.fasta -db blastdatabase -outfmt 6 -num_threads 3 -max_target_seqs 2000 -out tblastn_DB.tab

I know it's a simple question but I have not been able to solve it looking in the NCBI BLAST command line cookbook.

Thank you very much!

blast tblastn command-line blast output format • 6.6k views

ADD COMMENT • link updated 8.5 years ago by cschu181 ★ 2.8k • written 8.5 years ago by peachila • 0

1

Entering edit mode

I like @cschu1981 answer. Translating will be a little more difficult, unless they are an ORF, since you won't know which frame to translate in. However, you can look into EMBOSS transeq for translating your sequences. Did you get your DNA db from a public domain? Perhaps there is already a protein file you can cross-reference your db ids.

ADD REPLY • link 8.5 years ago by st.ph.n ★ 2.7k

score 2 · Answer 1 · 2017-01-31

If you created your blast database with the -parse_seqids option, then it should be quite easy.

for id in $(cut -f 2 tblastn_DB.tab); do
 blastdbcmd -entry $id -db blastdatabase >> results.fa
done

This assumes -outfmt 6 without custom fields (i.e. subject id is in field 2) as you state in your question.