Question

Calculate distance matrix from nucleotide alignment with multiple IUPAC ambiguity characters

1

Entering edit mode

6.0 years ago

Denis ▴ 310

I have a nucleotide multiple sequence alignment (MSA) with many IUPAC ambiguity characters like W,S,R, etc. I need to calculate distance matrix for making phylogenetic tree (as next step), but i'd like that all nucleotides (including ambiguity characters) would be taken into account during distances calculation. Is there any solution for my case. Thanks!

alignment sequence phylogenetics • 4.0k views

ADD COMMENT • link updated 6.0 years ago by Klaus S ▴ 160 • written 6.0 years ago by Denis ▴ 310

score 6 · Answer 1 · 2019-06-03

6

Entering edit mode

6.0 years ago

Klaus S ▴ 160

The function dist.ml in the R package phangorn handles ambiguity characters like they are handled in ML optimisation.

library(phangorn)
dat <- read.phyDat("msa.fas", format="fasta)
dist.ml(dat, model = "F81", exclude = "none")

ADD COMMENT • link 6.0 years ago by Klaus S ▴ 160

0

Entering edit mode

Thanks! As i understand F81 model is four parametric in typical case. I'm wondering if it would have much more parameters in case of IUPAC ambiguity characters? It will extract the base frequences from the nucleotide alignment, which i'm using as input. Am i right?

ADD REPLY • link 5.9 years ago by Denis ▴ 310

0

Entering edit mode

Hi Denis, I look for solution of similar problem my_question. Can you give me an advice, do you have solution? Best regards, Marcin

ADD REPLY • link 5.0 years ago by mschmidt ▴ 80