Dear all,
I'm new to this forum and new to computational analysis, I'm using standalone NCBI Blast+ (blastp) for the first time, and I have the results file in the following format:
Query= Y
Length=6
Subject= X
Length=739
Score = 15.4 bits (28), Expect = 0.044, Method: Composition-based stats.
Identities = 5/6 (83%), Positives = 6/6 (100%), Gaps = 0/6 (0%)
Query 1 DDDIPF 6
D+DIPF
Sbjct 244 DNDIPF 250
But I want to do a multiple alignments of all the hits, for that, I need to extract the sequences in the following fasts format:
>Subject= X
Sbjct 244 DNDIPF
Is there any tool which is helpful to do direct multiple alignments from blast results files or a tool/tutorial to extract the sequences in fasta format to process further. Thanks.
Dear Siva, my final goal is to take the "subject" sequences from the blastp report (results) and do multiple alignment to the see their evolution patterns.
I'm trying to do blast+ against my own database and wanted to take these sequences only.
If you only take the aligned portion of the subject sequences from the BLAST result, you might end up aligning different segments of the same sequence against one another. To avoid that you also need to take in to account the aligning regions of the query sequence.
The simpler way would be to align the full length sequences of the BLAST hits. You can get the list of all Subject sequence IDs by using any of the following options for
-outfmt
sallgi
means All subject GIssallseqid
means All subject Seq-id(s), separated by a ';'sallacc
means All subject accessionsOnce you have the list of sequence IDs or GIs, you can use
blastdbcmd
tool to extract the full length sequences from your BLASTdb, provided it was created with the option-parse_seqids
. Then, you can continue with multiple alignment and other evolutionary analyses.If you don't know already, you will find answers for most of your BLAST related questions here: BLAST Command Line Applications User Manual.
If you still want to go with getting multiple alignment from BLAST results, there is a tool called Blammer from HHsuite that converts BLAST/PSI-BLAST result in to multiple alignments. Though I have never used it.
Thanks Siva, this worked exactly the way I anticipated but
{ print ">"$2, $3 }
failed to write the data into the file, ($2
,$3
) remain empty, any ideas?As Ram suggested, please post the full BLAST and awk commands you used. If you have multiple HSPs from one sequence (different regions of the same sequence), using the awk command I posted above will result in multiple sequences with same FASTA headers. Multiple sequence alignment programs expect unique sequence identifiers.
Dear Siva & Ram,
I have used the following commends:
Thanks for the help, Still, I have one more question, I want to restrict my blastp results to evalue<0.001, and I tried to do in with
-outfmt
(-outfmt '6 qseqid sseqid sseq evalue<0.001
), but this filter doesn't seem to work, any idea?For the awk part, try specifying the separator explicitly using
awk -F "\t" 'BEGIN ...
In the second part, where you're trying to filter, you're adding the filter to the formatting option, which is incorrect. You might wanna add the evalue filter option with
-evalue 0.00099
to theblastp
commandThank you RamRS, This is what I say looking for. Now my sequences are ready for MSA.
Could you maybe give you the exact full command you used? That might help!