What Value For The Gap Penalty Should Be Used In A Pam250 Substitution Matrix?
3
0
Entering edit mode
12.2 years ago
coderodde • 0

Hello.

I have read the article "A* with Partial Expansion for large branching factor problems" [1] by T. Yoshizumi, T. Miura and T Ishida. Also, I came up with an implementation of their algorithm, and applied it to solving the multiple sequence alignment problem. The paper presents the PAM250 substitution matrix used for scoring the alignment, but they never stated what gap penalty value they used, but rather stated that the prior relevant research paper proposes the value of 8, which, in my program, results in alignment not matching the result of ClustalX on the same input.

What gap penalty value should I use in the context of [1]?

msa • 7.0k views
ADD COMMENT
3
Entering edit mode
12.2 years ago
Niek De Klein ★ 2.6k

There is no real consensus on gap penalty, because the 'best' gap penalty is dependent on which sequences you want to align. If you want to align proteins of two very distant organisms you want to set the gap penalties lower than if you want to align two proteins of closely related organisms.

As for the default value of ClustalX, according to this article on Effects of Gap Open and Gap Extension Penalties ClustalW uses GOP 15.0, GEP 6.66 by default. GOP is the gap opening penalty and GEP is the gep extension penalty. I think ClustalX uses the same default as ClustalW.

ADD COMMENT
0
Entering edit mode
12.2 years ago
Niek De Klein ★ 2.6k

I can't find it on their website right now, but are you sure that ClustalX uses PAM250? It's an outdated matrix and I would imagine ClustalX to use BLOSUM.

ADD COMMENT
0
Entering edit mode
12.2 years ago
coderodde • 0

[Clarification]

The article presents a demo alignment of 3 sequences with length 4, 3 and 3 acids as follows:

ACGH
CFG
EAC,

which aligns both in the article and Clustal as:

-AC-GH
--CFG-
EAC---

But if I use the implied gap penalty value of 8, I get:

ACGH
-CFG
-EAC

It is possible in my implementation to get to the "article demo"/Clustal alignment, but for that to happen I have to change the gap penalty from 8 to 5. So my question refines to "No matter how outdated the PAM250 is, what is the bioinformaticians' consensus on the gap penalty value when dealing with the aforementioned substitution matrix?"

ADD COMMENT

Login before adding your answer.

Traffic: 2524 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6