OrthoMCL: How to parse groups.txt for a phylogeny?
1
0
Entering edit mode
9.3 years ago
apelin20 ▴ 480

Hello,

I have 20 or so microsporidian genomes who are mostly assembled into a draft genome. I have extracted protein sequences for predicted genes in these genomes, and have clustered them with orthoMCL. I now want to construct a phylogeny, for which I need orthologous groups, prefereably consisting of single copy genes.

However, what I notice is that groups.txt also shows orthologous groups with multiples sequences per genome. How can I find a group in which all 20 genomes have only one sequence present, indicating that it is likely single copy?

Thanks,

Adrian

phylogeny Amino Acids proteins drafts • 3.5k views
ADD COMMENT
1
Entering edit mode
9.3 years ago
h.mon 35k

Which version of OrthoMCL? I can not remember, but I believe both 1.4 and 2.0 output the clusters files named as "X_genes_Y_taxa", so you just have to search for files with same number of species and genes - something like:

find . -name "*20_genes_20_taxa*"

should work. If I remember the naming convention incorrectly, forgive me for the misleading answer.

ADD COMMENT
0
Entering edit mode

Hello, I am using V2 which I hear is very different than prior versions, the only output is group.txt, as well as 3 other files that show all pairwise comparisons (http://orthomcl.org/common/downloads/software/v2.0/UserGuide.txt). I figured out one way to do this is by greping iterativly all taxa and finding groups which have all taxa, then look for clusters that have a size equal to the total amount of taxa, so 20, which would show you single copy orthologs. Not sure if there is a cleaner way to do it.

ADD REPLY
0
Entering edit mode

Hi Apelin20. I am doing the same thing to orthomcl2.0 output -- groups.txt. Have you, by any chance,, found any other way of processing the output for downstream analyses?

ADD REPLY

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6