Entering edit mode
2.2 years ago
hojoun
•
0
Hello, I am researching the intraspecific population genetics of snakes. (using ML,BI tree and Relaxedmolecular clock..)
My concatanated sequences are 2392 bp.
gene1 : 1113bp gene2 : 1279bp
in outgroup species, the gene 1 is complete(1113bp), but the Gene 2 is 696 bp.
In this situation, can it be used for phylogenetic analysis by adding missing data (symbol is ?) as much as the difference in length in gene2?
please help!! I have to graduate!!
While missing data can have an important effect (there are lots of papers about this, e.g., this one), based on your description, I am confused why you would be adding missing data symbols. Have you already generated a multiple sequence alignment from your data? That would be a step you want to take before any phylogenetic inference.
MUSCLE alignment was already performed on 74 sequences with two outgroups. The reason for adding the missing letter was because there was "no sequence data" in the corresponding region in outgroup. but in the ingroup, there were a certain mutations in the corresponding length for each population. Then, without adding missing data, can I set the outgroup with 696 bp and use it for analysis?
I guess I'm still confused. If the outgroup sequences were used in your MSA, then there should be no need to add any "missing data" symbols because the alignment should already look something like this toy example:
In the above, the outgroup sequences have gaps at the beginning of the sequence because they are shorter than the ingroup sequences. Leaving aside the question of why the outgroup sequences are shorter (deletion in the outgroup, insertion in the ingroup, technical artifact, etc.), most mainstream phylogenetics tools (at least those that I am aware of) do not explicitly model insertion/deletions in the analysis and will treat gaps like missing data.
Assuming your MSA looks something vaguely like my toy example, you should be able to use it in your phylogenetic analysis. That said, do no be surprised if having outgroups that possess dramatically less information relative to the ingroup may introduce some biases to the inference.
thank you!!! It was grate help to me.