Question

"U" in protein sequence

0

Entering edit mode

3.2 years ago

Shaurya • 0

I have sampled and aligned a set of 7000 proteomes. I tried to use RAxML to make maximum likelihood trees but I get an error saying "unknown character "U" is at position xyz". How do I remove the U's in my sampled file and what do I replace it with ? Or is there any other software to make maximum likelihood trees that can recognize U ?

phylogenetics alignment fasta • 768 views

ADD COMMENT • link updated 3.2 years ago by Mensur Dlakic ★ 28k • written 3.2 years ago by Shaurya • 0

score 1 · Answer 1 · 2021-10-20

Normally there is no U character in proteins, but sometimes it stands for selenocysteine. I suggest you make sure that you are not aligning an RNA sequence instead. If it is a protein, you can either remove the whole offending sequence (it should still be a pretty good dataset with 6999 proteomes), or replace U with C.

Out of curiosity, why are you aligning that many proteomes? It is almost impossible to have a meaningful view of that tree. Besides, if this is for prokaryotes, it is almost a guarantee that a better and larger tree already exists here.