Replace ambiguous characters in fasta MSA

0

Entering edit mode

15 months ago

Alexandre • 0

Hi everyone,

I have a MSA that I feed into a software that does not deal with Ns and many of the sequences of my MSA (~20%) have at least a couple of them

I am looking for a program that can compute the most likely state for each of the Ns in my alignment but I am not sure what to look for, maybe a phylogenetic software has that ability One important thing is that I want to keep the gaps in my alignment, they are important for the rest of the analyses.

I hope I was able to make myself clear

Thank you Alex

maximum-likelihood DNA • 504 views

ADD COMMENT • link updated 15 months ago by Joe 21k • written 15 months ago by Alexandre • 0

0

Entering edit mode

I would expect you could do this by building an HMM and using hmmemit (it would certainly work for protein, I have never tried with nucleic acid).

The main question would be how big the MSA is and how many sites have informative information to yield a prediction (if the column is all N, you can't magic up a guess of what else should be there).

ADD REPLY • link 15 months ago by Joe 21k

Login before adding your answer.