Hello,
I have many multiple sequence alignments, and I'm trying to visualize/quantify regions that are very similar to one another. I'm using plotcon https://emboss.sourceforge.net/apps/cvs/emboss/apps/plotcon.html but I'm getting very strange results.
I'm using the plotcon command plotcon -goutfile $data_file -sequences $msa_file -graph data -winsize 1
I am trying to understand the results, so I want to see what very simple peptides will result,
>1
AAA
>2
AAA
which results from plotcon show
1.000000 5.000000
2.000000 5.000000
3.000000 5.000000
which makes sense, that A-A similarity is the same at every position.
However, if I show the values of every amino acid compared against itself:
A {
A 5.000000
},
C {
C 5.000000
},
D {
D -1.000000
},
E {
E 5.000000
},
F {
F 6.000000
},
G {
G 5.000000
},
H {
H -1.000000
},
I {
I 4.000000
},
K {
K -1.000000
},
L {
L 4.000000
},
M {
M -1.000000
},
N {
N -1.000000
},
P {
P 7.000000
},
Q {
Q 5.000000
},
R {
R -1.000000
},
S {
S -1.000000
},
T {
T 5.000000
},
V {
V -1.000000
},
W {
W -1.000000
},
Y {
Y -1.000000
}
}
these results make no sense at all to me.
The matrix comparison table that's mentioned in the link above seems to be flawed, as identical amino acids can have different similarity values.
Why is T-T = A-A = 5, while Y-Y=W-W=-1? This strange result makes me doubt the usefulness of the plotcon command.