If your sequences are short and you only want to find simple differences (e.g. no gaps, etc..), you can use the tool is diffseq from the EMBOSS package.
If you need something more accurate, you should use a sequence alignment tool. A tool that works from the command line and is easy to set up is exonerate.
I am not sure about how you can get the output you requested with exonerate, but there is probably a way. Put your sequences in to different files:
echo '>seq1\nACKKAKCAKCAIKCAKCKACNGHSCKAAEUIIDHTN' > seq1.fasta
echo '>seq2\nACKKAKCAKCAIIKCAKCKACNGHSKAAEUIIDHTN' > seq2.fasta
You can also put more than a sequence in the same file, if it helps you organize the files. Then, run exonerate with the following:
$: exonerate -q seq1 -t seq2 --showsugar --showcigar -n 1 -m affine:global --exhaustive
Command line: [exonerate -q seq1.fasta -t seq2.fasta --showsugar --showcigar -n 1 -m affine:global --exhaustive --showvulgar]
Hostname: [henikoff]
** (process:31120): WARNING **: Exhaustively generating suboptimal alignments will be VERY SLOW
C4 Alignment:
------------
Query: seq1
Target: seq2
Model: affine:global:protein2protein
Raw score: 177
Query range: 0 -> 36
Target range: 0 -> 36
1 : ACKKAKCAKCA-IKCAKCKACNGHSCKAAEUIIDHTN : 36
||||||||||| ||||||||||||| |||||||||||
1 : ACKKAKCAKCAIIKCAKCKACNGHS-KAAEUIIDHTN : 36
sugar: seq1 0 36 . seq2 0 36 . 177
cigar: seq1 0 36 . seq2 0 36 . 177 M 11 D 1 M 13 I 1 M 11
vulgar: seq1 0 36 . seq2 0 36 . 177 M 11 11 G 0 1 M 13 13 G 1 0 M 11 11
Have a look at the --showcigar
, --showvulgar
, and --showsugar
options, and specially at the --ryo
option for more output options and their explanation.
I would add
tr '[:lower:]' '[:upper:]'
to make sure all aminoacids are in the same case.What a clever answer, I have laught when I see this :D
However this way is no so intuitive. Because I have to extract the sequences from the FASTA files by another command and count the mutations by myself. A program should be better. Anyway, thanks for your help :D.