Question

Gaps/missing data treatment when making tree

0

Entering edit mode

9.9 years ago

teaelleceecee • 0

I currently have a dataset where the majority of my sequences are around 550bp long, but then I have a couple of sequences that are missing about 200bp of this. The alignment has no gaps apart from this missing sequence. From what was explained to me, I thought 'Pairwise deletion' would create a tree that would not show variation between 2 sequences of different lengths if they had exactly the same sequence for the parts that did overlap/align. However when I make a NJ tree using pairwise deletion, I am seeing variation that is due to difference in length of sequences, rather than difference in aligned sequence. Other than complete deletion, is there a gaps/missing data treatment or different statistical method that will not produce variation in the tree that arises from sequences being different lengths? Thank you for your help. Apologies if this is a very basic question, I am new to phylogenetics.

phylogenetics tree • 3.7k views

ADD COMMENT • link updated 9.8 years ago by kloetzl ★ 1.1k • written 9.9 years ago by teaelleceecee • 0

score 0 · Answer 1 · 2015-02-23

0

Entering edit mode

9.8 years ago

kloetzl ★ 1.1k

As you use NJ for the tree, alignment-free distance estimation methods may help. Most of them only count SNPs (or estimate substitution rates) and ignore large gaps. Here are three tools, that you can try.

ADD COMMENT • link 9.8 years ago by kloetzl ★ 1.1k