Hello,
I am interested in calculating diversity for a large amount of genes in a given phylum.
What I did, is I took all genes from my organism in question and found true orthologues using inParanoid in 3 different taxa. I now have a table that looks like this:
LOC1 Taxa1_Orth_LOC1 Taxa2_Orth_LOC1 Taxa3_Orth_LOC1
LOC2 Taxa1_Orth_LOC2 Taxa2_Orth_LOC2 Taxa3_Orth_LOC2 ...
In column 1 I have the name of the locus in my organism, in columns 2-4 I have the name of the true orthologues locus in taxa 1 2 and 3.
I also have fasta files with the ORFs of all loci from all my different taxa. So for Taxa #1 I have a fasta file with the sequences Taxa1_Orth_LOC1
and Taxa1_Orth_LOC2
and so on....
Now, since I have the fasta files and the table of true orthologues, how can I calculate diversity using R in a quick manner? I know there are ways of doing it in codeml, but setting up each alignment will be a very difficult task.
Any thoughts on how this can be done?
Thank you,
Adrian