Hello,
I have 20 or so microsporidian genomes who are mostly assembled into a draft genome. I have extracted protein sequences for predicted genes in these genomes, and have clustered them with orthoMCL. I now want to construct a phylogeny, for which I need orthologous groups, prefereably consisting of single copy genes.
However, what I notice is that groups.txt also shows orthologous groups with multiples sequences per genome. How can I find a group in which all 20 genomes have only one sequence present, indicating that it is likely single copy?
Thanks,
Adrian
Hello, I am using V2 which I hear is very different than prior versions, the only output is group.txt, as well as 3 other files that show all pairwise comparisons (http://orthomcl.org/common/downloads/software/v2.0/UserGuide.txt). I figured out one way to do this is by greping iterativly all taxa and finding groups which have all taxa, then look for clusters that have a size equal to the total amount of taxa, so 20, which would show you single copy orthologs. Not sure if there is a cleaner way to do it.
Hi Apelin20. I am doing the same thing to orthomcl2.0 output -- groups.txt. Have you, by any chance,, found any other way of processing the output for downstream analyses?