Tool for aligning short protein sequences
2
Hi,
I have a file that looks like:
> ref_frame= 1
XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
> ref_frame= 2
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
> ref_frame= 3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
> read1_frame= 1
FFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
> read1_frame= 2
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
> read1_frame= 3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
I want to do a protein alignment where I align each read
frame against each ref
frame. What tool can I use to do the alignment on the command line? I have considered using blastp and diamond however other answers to similar questions here on Biostars have suggested those are better used as a search tool against a database.
protein
alignment
• 1.1k views
You can use needleall from the EMBOSS package. The tool performs all-versus-all pairwise global alignments.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat srspair -outfile seqs.needleall
Output:
ref_frame= 1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA 44
|| || || || || || || || || || || || || || || || || || || || || |
ref_frame= 1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA 44
ref_frame= 2 1 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRS-GVETTTPPQKQ 44
.. || .. .. .:| :.| . | .. .:.. .. :| .. .. .:. | .:.. :| .:.
ref_frame= 1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA- 44
.. .
You can also change the output format.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat score -outfile seqs.needleall
Output:
ref_frame= 1 ref_frame= 1 44 ( 220.0)
ref_frame= 2 ref_frame= 1 45 ( 24.0)
ref_frame= 3 ref_frame= 1 45 ( 14.0)
read1_frame= 1 ref_frame= 1 44 ( 220.0)
read1_frame= 2 ref_frame= 1 45 ( 24.0)
read1_frame= 3 ref_frame= 1 45 ( 14.0)
ref_frame= 1 ref_frame= 2 45 ( 24.0)
ref_frame= 2 ref_frame= 2 44 ( 240.0)
ref_frame= 3 ref_frame= 2 59 ( 21.0)
read1_frame= 1 ref_frame= 2 45 ( 31.0)
read1_frame= 2 ref_frame= 2 44 ( 240.0)
read1_frame= 3 ref_frame= 2 59 ( 21.0)
ref_frame= 1 ref_frame= 3 45 ( 14.0)
ref_frame= 2 ref_frame= 3 59 ( 21.0)
ref_frame= 3 ref_frame= 3 44 ( 218.0)
read1_frame= 1 ref_frame= 3 45 ( 14.0)
read1_frame= 2 ref_frame= 3 59 ( 21.0)
read1_frame= 3 ref_frame= 3 44 ( 218.0)
ref_frame= 1 read1_frame= 1 44 ( 220.0)
ref_frame= 2 read1_frame= 1 45 ( 31.0)
ref_frame= 3 read1_frame= 1 45 ( 14.0)
read1_frame= 1 read1_frame= 1 44 ( 227.0)
read1_frame= 2 read1_frame= 1 45 ( 31.0)
read1_frame= 3 read1_frame= 1 45 ( 14.0)
ref_frame= 1 read1_frame= 2 45 ( 24.0)
ref_frame= 2 read1_frame= 2 44 ( 240.0)
ref_frame= 3 read1_frame= 2 59 ( 21.0)
read1_frame= 1 read1_frame= 2 45 ( 31.0)
read1_frame= 2 read1_frame= 2 44 ( 240.0)
read1_frame= 3 read1_frame= 2 59 ( 21.0)
ref_frame= 1 read1_frame= 3 45 ( 14.0)
ref_frame= 2 read1_frame= 3 59 ( 21.0)
ref_frame= 3 read1_frame= 3 44 ( 218.0)
read1_frame= 1 read1_frame= 3 45 ( 14.0)
read1_frame= 2 read1_frame= 3 59 ( 21.0)
read1_frame= 3 read1_frame= 3 44 ( 218.0)
cat input.fa | paste - - | grep read1_ | tr "\t" "\n" > input1.fa && cat input.fa | paste - - | grep ref_ | tr "\t" "\n" > input2.fa && blastp -query input1.fa -subject input2.fa
Login before adding your answer.
Traffic: 3451 users visited in the last hour
Thanks Pierre, If my next step was to identify from the blastp output which alignment produces the best result, is there a way of doing that with a command?