How can Biostrings calculate consensus sequence without counting gaps?
1
0
Entering edit mode
2.1 years ago
MatthewP ★ 1.4k

Hello, I want to get a consensus sequence from multiple sequences alignment with R package Biostrings. But I want Biostrings don't count gaps when some positions have gaps and nucleotides.
Here is an example, the fourth position contains C, G and gaps, I hope Biostrings calculate the consensus nucleotide in this position from C,G,G three nucleotides, so the expected result is ATCS or ATCG, not ATC-. Thanks.

> xx <- DNAMultipleAlignment(DNAStringSet(c("ATCC", "ATCG", "ATCG", rep_len("ATC-", 10))))
> xx
DNAMultipleAlignment with 13 rows and 4 columns
      aln
 [1] ATCC
 [2] ATCG
 [3] ATCG
 [4] ATC-
 [5] ATC-
 [6] ATC-
 [7] ATC-
 [8] ATC-
 [9] ATC-
[10] ATC-
[11] ATC-
[12] ATC-
[13] ATC-
> consensusMatrix(xx, as.prob = TRUE)
  [,1] [,2] [,3]       [,4]
A    1    0    0 0.00000000
C    0    0    1 0.07692308
G    0    0    0 0.15384615
T    0    1    0 0.00000000
M    0    0    0 0.00000000
R    0    0    0 0.00000000
W    0    0    0 0.00000000
S    0    0    0 0.00000000
Y    0    0    0 0.00000000
K    0    0    0 0.00000000
V    0    0    0 0.00000000
H    0    0    0 0.00000000
D    0    0    0 0.00000000
B    0    0    0 0.00000000
N    0    0    0 0.00000000
-    0    0    0 0.76923077
+    0    0    0 0.00000000
.    0    0    0 0.00000000
> consensusString(xx)
[1] "ATC-"
Biostrings alignment • 1.0k views
ADD COMMENT
0
Entering edit mode
2.1 years ago
Mensur Dlakic ★ 28k

so the expected result is ATCS or ATCG, not ATC-

That is not the consensus of that group of sequences, as gaps in 4th position vastly outnumber bases. Try it just with your first 4-5 sequences, and you may get what you want. But with all the sequences listed, I would expect the consensus to be exactly ATC-, because that is the only accurate representation of the group.

ADD COMMENT
0
Entering edit mode

I know it's not a consensus sequence strictly, but in my case, I do hope to keep as much information as I can.

ADD REPLY

Login before adding your answer.

Traffic: 1798 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6