Entering edit mode
2.0 years ago
hojoun
•
0
Hello, I am researching the phylogenetic tree of snakes. (bayesian Inference tree). My sequences are 1300 bp, but sequences of outgroups are about 600 bp. Can I change the difference of about 700bp to missing data and analyze it?
Since there is no appropriate length between related species, I tried using outgroup with a long distance, but the resolution is poor. Do I need to cut off the non-overlapping area before using it?
This may or may not be a helpful answer, but depending on what tools you use for alignment and tree construction, some of them will have the option to ignore missing data (gaps), so you'll be able to use sequences of different lengths.
I'm not enough of a phylogenetics pro to be able to tell you what, if any, downstream consequences of this there might be for the inference.
Most of the tools that make phylogenetic inferences are internally able to handle missing data. For example, HyPhy, from what I understand, removes the missing data internally to make dN estimates but includes them in counting frequencies, more on that in this post (note this applies mostly to coding regions). That being said, in the specific case that you are describing, it seems that the number of gaps is larger than potentially align-able sites in the outgroup, I am just wondering if there could be a specific reason for this - a large chunk of sequence that got inserted from your ingroup/deleted from your outgroup? As the distance between your ingroup and outgroup increases, you start introducing more noise. There are many papers that have tried to identify/define "conserved" elements in distant genomes and each have a specific definition of "conserved" elements (in terms of length and similarity, I have tried to summarise some here maybe it helps). Maybe it is worth checking if there are studies that have looked at the same pair of species and set a threshold of "conserved" elements?
Thank you for your reply.
The species-specific primers I developed can amplify 1300 bp, but the primer used to amplify the outgroup sequence can amplify 696 bp, and there is no further information.
There is significant variation within the sequence of the ingroup. However, when trimming to 696 bp, the mutation cannot be used.