Question

Multiple sequence alignment: amino acid similarity

0

Entering edit mode

11 months ago

dec986 ▴ 380

Hello,

I have many multiple sequence alignments, and I'm trying to visualize/quantify regions that are very similar to one another. I'm using plotcon https://emboss.sourceforge.net/apps/cvs/emboss/apps/plotcon.html but I'm getting very strange results.

I'm using the plotcon command plotcon -goutfile $data_file -sequences $msa_file -graph data -winsize 1

I am trying to understand the results, so I want to see what very simple peptides will result,

>1
AAA
>2
AAA

which results from plotcon show

1.000000    5.000000
2.000000    5.000000
3.000000    5.000000

which makes sense, that A-A similarity is the same at every position.

However, if I show the values of every amino acid compared against itself:

    A   {
        A   5.000000
    },
    C   {
        C   5.000000
    },
    D   {
        D   -1.000000
    },
    E   {
        E   5.000000
    },
    F   {
        F   6.000000
    },
    G   {
        G   5.000000
    },
    H   {
        H   -1.000000
    },
    I   {
        I   4.000000
    },
    K   {
        K   -1.000000
    },
    L   {
        L   4.000000
    },
    M   {
        M   -1.000000
    },
    N   {
        N   -1.000000
    },
    P   {
        P   7.000000
    },
    Q   {
        Q   5.000000
    },
    R   {
        R   -1.000000
    },
    S   {
        S   -1.000000
    },
    T   {
        T   5.000000
    },
    V   {
        V   -1.000000
    },
    W   {
        W   -1.000000
    },
    Y   {
        Y   -1.000000
    }
}

these results make no sense at all to me.

The matrix comparison table that's mentioned in the link above seems to be flawed, as identical amino acids can have different similarity values.

Why is T-T = A-A = 5, while Y-Y=W-W=-1? This strange result makes me doubt the usefulness of the plotcon command.

blast alignment • 376 views

ADD COMMENT • link 11 months ago by dec986 ▴ 380