Tool for aligning short protein sequences
2
0
Entering edit mode
22 months ago
SaltedPork ▴ 170

Hi,

I have a file that looks like:

>ref_frame=1 
XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>ref_frame=2 
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>ref_frame=3 
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR
>read1_frame=1 
FFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA
>read1_frame=2 
FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRSGVETTTPPQKQ
>read1_frame=3
F*KKSGLPTKKGQRIFFRADQSQQPHQKRASGLG*RQQLPLRSR

I want to do a protein alignment where I align each read frame against each ref frame. What tool can I use to do the alignment on the command line? I have considered using blastp and diamond however other answers to similar questions here on Biostars have suggested those are better used as a search tool against a database.

protein alignment • 941 views
ADD COMMENT
4
Entering edit mode
22 months ago

You can use needleall from the EMBOSS package. The tool performs all-versus-all pairwise global alignments.

needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat srspair -outfile seqs.needleall

Output:

########################################
# Program: needleall
# Rundate: Wed 25 Jan 2023 12:33:06
# Commandline: needleall
#    -asequence seqs.fa
#    -bsequence seqs.fa
#    -gapopen 10.0
#    -gapextend 0.5
#    -aformat srspair
#    -outfile seqs.needleall
# Align_format: srspair
# Report_file: seqs.needleall
########################################

#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=1
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 44
# Identity:      44/44 (100.0%)
# Similarity:    43/44 (97.7%)
# Gaps:           0/44 ( 0.0%)
# Score: 220.0
# 
#
#=======================================

ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA     44
                      |||||||||||||||||||||||||||||||||||||||||||
ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA     44


#=======================================
#
# Aligned_sequences: 2
# 1: ref_frame=2
# 2: ref_frame=1
# Matrix: EBLOSUM62
# Gap_penalty: 10.0
# Extend_penalty: 0.5
#
# Length: 45
# Identity:       8/45 (17.8%)
# Similarity:    16/45 (35.6%)
# Gaps:           2/45 ( 4.4%)
# Score: 24.0
# 
#
#=======================================

ref_frame=2        1 FLKKIWPSYKKRPKNFLQSRPEPTAPPEESFRS-GVETTTPPQKQ     44
                     ..||.....:|:.|.|...:....:|.....:. |.:..:|.:. 
ref_frame=1        1 XFKKNLAFLQKKAKEFSSEQTRANSPTRRELQVWGRDNNSPSEA-     44


#=======================================
...

You can also change the output format.

needleall -asequence seqs.fa -bsequence seqs.fa -gapopen 10.0 -gapextend 0.5 -aformat score -outfile seqs.needleall

Output:

ref_frame=1 ref_frame=1 44 (220.0)
ref_frame=2 ref_frame=1 45 (24.0)
ref_frame=3 ref_frame=1 45 (14.0)
read1_frame=1 ref_frame=1 44 (220.0)
read1_frame=2 ref_frame=1 45 (24.0)
read1_frame=3 ref_frame=1 45 (14.0)
ref_frame=1 ref_frame=2 45 (24.0)
ref_frame=2 ref_frame=2 44 (240.0)
ref_frame=3 ref_frame=2 59 (21.0)
read1_frame=1 ref_frame=2 45 (31.0)
read1_frame=2 ref_frame=2 44 (240.0)
read1_frame=3 ref_frame=2 59 (21.0)
ref_frame=1 ref_frame=3 45 (14.0)
ref_frame=2 ref_frame=3 59 (21.0)
ref_frame=3 ref_frame=3 44 (218.0)
read1_frame=1 ref_frame=3 45 (14.0)
read1_frame=2 ref_frame=3 59 (21.0)
read1_frame=3 ref_frame=3 44 (218.0)
ref_frame=1 read1_frame=1 44 (220.0)
ref_frame=2 read1_frame=1 45 (31.0)
ref_frame=3 read1_frame=1 45 (14.0)
read1_frame=1 read1_frame=1 44 (227.0)
read1_frame=2 read1_frame=1 45 (31.0)
read1_frame=3 read1_frame=1 45 (14.0)
ref_frame=1 read1_frame=2 45 (24.0)
ref_frame=2 read1_frame=2 44 (240.0)
ref_frame=3 read1_frame=2 59 (21.0)
read1_frame=1 read1_frame=2 45 (31.0)
read1_frame=2 read1_frame=2 44 (240.0)
read1_frame=3 read1_frame=2 59 (21.0)
ref_frame=1 read1_frame=3 45 (14.0)
ref_frame=2 read1_frame=3 59 (21.0)
ref_frame=3 read1_frame=3 44 (218.0)
read1_frame=1 read1_frame=3 45 (14.0)
read1_frame=2 read1_frame=3 59 (21.0)
read1_frame=3 read1_frame=3 44 (218.0)

#---------------------------------------
#---------------------------------------
ADD COMMENT
1
Entering edit mode
22 months ago
cat input.fa | paste - - | grep read1_ | tr "\t" "\n" > input1.fa && cat input.fa | paste - - | grep ref_ | tr "\t" "\n" > input2.fa && blastp -query input1.fa  -subject input2.fa 
ADD COMMENT
0
Entering edit mode

Thanks Pierre, If my next step was to identify from the blastp output which alignment produces the best result, is there a way of doing that with a command?

ADD REPLY

Login before adding your answer.

Traffic: 1695 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6