I'm looking for a substitution matrix for aligning short DNA sequences using IUPAC nucleotide ambiguity codes. I would guess there are existing solutions but I haven't found any despite of extensive googling.
Just beware that this matrix derived from the fasta aligner. It is for distant homology searches. For intra-species alignment, the mismatch penalty is higher than a matching score.
Is it possible to change the penalties to reflect matches among the IUPACs? If so how?
It penalizes B (C,G,T) , D(A,G,T), H(A,C,T), V(A,C,G) negatively (negative score with all other NTs and ambiguous codes & self) and will be represented as mismatch in the alignment with their respective NTs & itself. For e.g.
B will be a mismatch with B and B will be a mismatch with C/G/T.
Thanks for the spot on answer! My task is aligning human normal - tumor fragments. What kind of penalty would you suggest for opening end extending a gap?
I just found this answer on EMBOSS mailing list:
"NUC4.2 (EDNAMAT) simply scores 5 for a match, and -4 for a mismatch. NUC4.4 (EDNAFULL) scores 5 for a match, but provides appropriate scores for ambiguity codes so that, for example, R:A scores +1 (rounded up average of -4, -4, 5, 5)". These two matrices are handled by the program "water" from EMBOSS, also available online. About defining gap penalties, this book should help: Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998). URL http://www.worldcat.org/isbn/0521629713.
Just beware that this matrix derived from the fasta aligner. It is for distant homology searches. For intra-species alignment, the mismatch penalty is higher than a matching score.
Is it possible to change the penalties to reflect matches among the IUPACs? If so how?
It penalizes B (C,G,T) , D(A,G,T), H(A,C,T), V(A,C,G) negatively (negative score with all other NTs and ambiguous codes & self) and will be represented as mismatch in the alignment with their respective NTs & itself. For e.g. B will be a mismatch with B and B will be a mismatch with C/G/T.
Thanks for the spot on answer! My task is aligning human normal - tumor fragments. What kind of penalty would you suggest for opening end extending a gap?
I just found this answer on EMBOSS mailing list: "NUC4.2 (EDNAMAT) simply scores 5 for a match, and -4 for a mismatch. NUC4.4 (EDNAFULL) scores 5 for a match, but provides appropriate scores for ambiguity codes so that, for example, R:A scores +1 (rounded up average of -4, -4, 5, 5)". These two matrices are handled by the program "water" from EMBOSS, also available online. About defining gap penalties, this book should help: Durbin, R., Eddy, S. R., Krogh, A. & Mitchison, G. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids (Cambridge University Press, 1998). URL http://www.worldcat.org/isbn/0521629713.