Entering edit mode
6.1 years ago
pentium3-user
▴
30
Hi,
I have two sequences
>seq1
AAAAAAACCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAA
>seq2
TTTTTTTTTTTTTTTTTTTCCCCCCCCCCTTTTTTTTTTTTT
And I want to align them this way using clustalW
AAAAAAA-------------------CCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAA-------------
-------TTTTTTTTTTTTTTTTTTTCCCCCCCCCC-------------------------TTTTTTTTTTTTT
The exact alignment is not important but I want that only the Cs are aligned and the mismatching bases are gapped.
So I set the following scores
- match: 1
- mismatch: -1
- gap opening: 0
- gap extending: 0
But what I get is this alignment
------------AAAAAAACCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAA
TTTTTTTTTTTTTTTTTTTCCCCCCCCCCTTTTTTTTTTTTT------------
I don't understand why this is happening. The score for my expected alignment would be 10 (10 matches) while the score for the alignment I get should be -10 (10 matches and 20 mismatches).
I use the following R code
library (msa)
seqs <- DNAStringSet(c("AAAAAAACCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAA", "TTTTTTTTTTTTTTTTTTTCCCCCCCCCCTTTTTTTTTTTTT"))
smatrix <- matrix(c(1,-1,-1,-1,-1,1,-1,-1,-1,-1,1,-1,-1,-1,-1,1), ncol=4, nrow=4, byrow = T)
colnames(smatrix) <- c("A", "C", "G", "T")
rownames(smatrix) <- c("A", "C", "G", "T")
aln <- msa (seqs, type = "dna", gapOpening=0, gapExtension=0, substitutionMatrix = smatrix)
print(aln, show="complete")
Output:
MsaDNAMultipleAlignment with 2 rows and 54 columns
aln
[1] ------------AAAAAAACCCCCCCCCCAAAAAAAAAAAAAAAAAAAAAAAAA
[2] TTTTTTTTTTTTTTTTTTTCCCCCCCCCCTTTTTTTTTTTTT------------
Con ???????????????????CCCCCCCCCC?????????????????????????
Does anyone know why this is happening and what do I have to change to get my expected result?