Question

How can Biostrings calculate consensus sequence without counting gaps?

0

Entering edit mode

2.9 years ago

MatthewP ★ 1.4k

Hello, I want to get a consensus sequence from multiple sequences alignment with R package Biostrings. But I want Biostrings don't count gaps when some positions have gaps and nucleotides.
Here is an example, the fourth position contains C, G and gaps, I hope Biostrings calculate the consensus nucleotide in this position from C,G,G three nucleotides, so the expected result is ATCS or ATCG, not ATC-. Thanks.

> xx <- DNAMultipleAlignment(DNAStringSet(c("ATCC", "ATCG", "ATCG", rep_len("ATC-", 10))))
> xx
DNAMultipleAlignment with 13 rows and 4 columns
      aln
 [1] ATCC
 [2] ATCG
 [3] ATCG
 [4] ATC-
 [5] ATC-
 [6] ATC-
 [7] ATC-
 [8] ATC-
 [9] ATC-
[10] ATC-
[11] ATC-
[12] ATC-
[13] ATC-
> consensusMatrix(xx, as.prob = TRUE)
  [,1] [,2] [,3]       [,4]
A    1    0    0 0.00000000
C    0    0    1 0.07692308
G    0    0    0 0.15384615
T    0    1    0 0.00000000
M    0    0    0 0.00000000
R    0    0    0 0.00000000
W    0    0    0 0.00000000
S    0    0    0 0.00000000
Y    0    0    0 0.00000000
K    0    0    0 0.00000000
V    0    0    0 0.00000000
H    0    0    0 0.00000000
D    0    0    0 0.00000000
B    0    0    0 0.00000000
N    0    0    0 0.00000000
-    0    0    0 0.76923077
+    0    0    0 0.00000000
.    0    0    0 0.00000000
> consensusString(xx)
[1] "ATC-"

Biostrings alignment • 1.4k views

ADD COMMENT • link 2.9 years ago by MatthewP ★ 1.4k

score 0 · Answer 1 · 2022-10-12

0

Entering edit mode

2.9 years ago

Mensur Dlakic ★ 29k

so the expected result is ATCS or ATCG, not ATC-

That is not the consensus of that group of sequences, as gaps in 4th position vastly outnumber bases. Try it just with your first 4-5 sequences, and you may get what you want. But with all the sequences listed, I would expect the consensus to be exactly ATC-, because that is the only accurate representation of the group.

ADD COMMENT • link 2.9 years ago by Mensur Dlakic ★ 29k

0

Entering edit mode

I know it's not a consensus sequence strictly, but in my case, I do hope to keep as much information as I can.

ADD REPLY • link 2.9 years ago by MatthewP ★ 1.4k