I have around 25,000 contigs each for two species. I'm trying to calculate Ka and Ks (or Dn and Ds) for potential orthologs between species, and potential paralogs within species. I have a list of contigs which I want to calculate Ka and Ks for, which looks like:
Species1_Contig12, Species1_Contig98,0
Species2_Contig16_Species1_Contig24,1
(The number signifies the comparison being made - 0 = species 1 to species 1, 1=species 1 to species 2, 2 = species 2 to species 2)
I also have a FASTA file with the consensus sequences for each contig. Names are the same in the FASTA file as in the comparison file, and those sequences have been annotated against other sequences by BLASTx, which suggests which reading frame I should be looking at.
I would like to calculate Ka and Ks between the pairs in the file, but I have no idea how to do it - PAML has been suggested, but the data isn't in the right format.
Can anyone offer me any pointers on how to go about this?