So, I have a group of genes (proteins) - that arent necesarely related and I am interested in dn/ds ratio, or rather the conservation. They are all from the same species (A.thaliana) and I have data on around 600 individuals (mostly from 1001genomes).
What software should I stick to?
I was thinking of PAML (Phylogenetic Analysis by Maximum Likelihood), more specifically codeml. I would give it aligned sequences, so 600 sequences of the same protein and a tree that goes with it, but I saw people say it is best suited for more distant species.
Is my approach wrong? What should I stick to for this?
I would recommend to read all about codeml program (pp. from 28th to 38th.). You will understand what these parameters represent and all that is necessary to specify to run PAML.
I suggest using runmode=-2. What this mode means is described on p. 34.
if they are not related (i.e. not homologous) you CANNOT use phylogenetic based methods