Tool for aligning short protein sequences
2
Hi,
I have a file that looks like:
>ref_frame=1
XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>ref_frame=2
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>ref_frame=3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
>read1_frame=1
FFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>read1_frame=2
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>read1_frame=3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
I want to do a protein alignment where I align each read
frame against each ref
frame. What tool can I use to do the alignment on the command line? I have considered using blastp and diamond however other answers to similar questions here on Biostars have suggested those are better used as a search tool against a database.
protein
alignment
• 940 views
You can use needleall from the EMBOSS package. The tool performs all-versus-all pairwise global alignments.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat srspair -outfile seqs.needleall
Output:
########################################
# Program: needleall
# Rundate: Wed 25 Jan 2023 12:33:06
# Commandline: needleall
# -asequence seqs.fa
# -bsequence seqs.fa
# -gapopen 10.0
# -gapextend 0.5
# -aformat srspair
# -outfile seqs.needleall
# Align_format: srspair
# Report_file: seqs.needleall
########################################
#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=1
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 44
# Identity: 44/44 (100.0%)
# Similarity: 43/44 (97.7%)
# Gaps: 0/44 ( 0.0%)
# Score: 220.0
#
#
#=======================================
ref_frame=1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA 44
|||||||||||||||||||||||||||||||||||||||||||
ref_frame=1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA 44
#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=2
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 45
# Identity: 8/45 (17.8%)
# Similarity: 16/45 (35.6%)
# Gaps: 2/45 ( 4.4%)
# Score: 24.0
#
#
#=======================================
ref_frame=2 1 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRS-GVETTTPPQKQ 44
..||.....:|:.|.|...:....:|.....:. |.:..:|.:.
ref_frame=1 1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA- 44
#=======================================
...
You can also change the output format.
needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat score -outfile seqs.needleall
Output:
ref_frame=1 ref_frame=1 44 (220.0)
ref_frame=2 ref_frame=1 45 (24.0)
ref_frame=3 ref_frame=1 45 (14.0)
read1_frame=1 ref_frame=1 44 (220.0)
read1_frame=2 ref_frame=1 45 (24.0)
read1_frame=3 ref_frame=1 45 (14.0)
ref_frame=1 ref_frame=2 45 (24.0)
ref_frame=2 ref_frame=2 44 (240.0)
ref_frame=3 ref_frame=2 59 (21.0)
read1_frame=1 ref_frame=2 45 (31.0)
read1_frame=2 ref_frame=2 44 (240.0)
read1_frame=3 ref_frame=2 59 (21.0)
ref_frame=1 ref_frame=3 45 (14.0)
ref_frame=2 ref_frame=3 59 (21.0)
ref_frame=3 ref_frame=3 44 (218.0)
read1_frame=1 ref_frame=3 45 (14.0)
read1_frame=2 ref_frame=3 59 (21.0)
read1_frame=3 ref_frame=3 44 (218.0)
ref_frame=1 read1_frame=1 44 (220.0)
ref_frame=2 read1_frame=1 45 (31.0)
ref_frame=3 read1_frame=1 45 (14.0)
read1_frame=1 read1_frame=1 44 (227.0)
read1_frame=2 read1_frame=1 45 (31.0)
read1_frame=3 read1_frame=1 45 (14.0)
ref_frame=1 read1_frame=2 45 (24.0)
ref_frame=2 read1_frame=2 44 (240.0)
ref_frame=3 read1_frame=2 59 (21.0)
read1_frame=1 read1_frame=2 45 (31.0)
read1_frame=2 read1_frame=2 44 (240.0)
read1_frame=3 read1_frame=2 59 (21.0)
ref_frame=1 read1_frame=3 45 (14.0)
ref_frame=2 read1_frame=3 59 (21.0)
ref_frame=3 read1_frame=3 44 (218.0)
read1_frame=1 read1_frame=3 45 (14.0)
read1_frame=2 read1_frame=3 59 (21.0)
read1_frame=3 read1_frame=3 44 (218.0)
#---------------------------------------
#---------------------------------------
cat input.fa | paste - - | grep read1_ | tr "\t" "\n" > input1.fa && cat input.fa | paste - - | grep ref_ | tr "\t" "\n" > input2.fa && blastp -query input1.fa -subject input2.fa
Login before adding your answer.
Traffic: 1725 users visited in the last hour
Thanks Pierre, If my next step was to identify from the blastp output which alignment produces the best result, is there a way of doing that with a command?