Dear Biostars community,
I want to estimate divergence between two long terminal repeats from a single LTR retrotransposon, which are assumed to be identical when a new element is inserted. I have estimated substitutions per site this using baseml, selecting a model of sequence evolution more or less at random, but I want to justify which model of sequence evolution to use. My approach has been:
Infer a tree from a set of presumably related LTR retrotransposons using protein coding domains (e.g. reverse transcriptase, integrase)
Select a monophyletic group of LTR retrotransposons and make an alignment of the long terminal repeats
Graft the long terminal repeats onto the terminal taxa in the tree inferred in step 1
run jModeltest2 using the alignment from step 2 and tree from step 3
I did this and got a model that was most highly supported, but when I ran jModeltest2 again using the tree from step 1 and an alignment of the protein coding domain used in the inference of that tree, I get different most highly supported models. My thought is that the model estimated as best from the tree and alignment of LTRs is the one I should use, but I am unsure if there is something I'm missing. Maybe I'm going about this the wrong way. Any insights or comments are appreciated, thank you!