Entering edit mode
10.9 years ago
Kash
▴
110
Hi all,
What is the relationship between the length of the sequences used for phylogenetic tree building and the accuracy of the generated tree?
I have generated set of sequences which are around 20,000bp of length, they are generated from NGS data using stacks pipeline. So I'm wondering whether there is a connection between the sequence length and the inferred tree accuracy.
Any help is highly appreciated and thank you in advance.
Regards, kmkdesilva
It depends on what you are studyng (phylogeny of species or of genes?). Also, what do you exactly mean by connection? Linear correlation? A correlation that can be expressed by any kind of function? A limit under (or above) which the tree is unreliable? I try to speculate a little bit: There is certanily a connection. Imagine your sequence are 10bp long. Would you trust the phylogenetic tree? On the contrary. Imagine your sequence are as long as an entire chromosome (I assume, that given your data your are studying phylogeny based on a given gene). Would you trust them or would you be worried by the risk of mixing phylogeny information at your locus with phylogeny information at several other loci that might have evolved with different histories? Although I root for the motto "the more the better", if more implies adding low quality sequences, or sequences belonging to a portion of the genome that had an evolution different from the specific locus you are interested in, then maybe the more is not always the better.
It is actually phylogeny of species. I extracted this SNPs from whole genome. I tried to remove most of the low quality data at the beginning. I was wondering whether accuracy increases or decreases when the sequence length is increasing.