Hi. I was wondering if anybody can help me figure out how to use Orthomcl to identify the core genome of E. coli genomes? I have 52 E. coli genomes that I used in orthomcl to produce ortholog groups. I followed all the steps in the user guide, until I got to the end. Now I'm left with this massive file of ortholog groups, but I'm unsure how to proceed.
This is a snippet from the middle of my output file, as the head command just gives too much information as it's my biggest ortholog group. The part before the colon is the ortholog group, the parts after that are genomes and genes which are clustered together into groups.
ecoli6370: col125|YP_006311412.1 col139|YP_007556103.1 col23|YP_001729413.1 col3|NP_286258.1 col4|NP_308598.1 col53|YP_002998320.1 col55|YP_003043686.1 col56|YP_003053130.1 col7|YP_488800.1 col73|YP_003498239.1 col92|YP_006127895.1
ecoli6371: col125|YP_006312035.1 col127|YP_006770839.1 col131|YP_006779890.1 col134|YP_006785029.1 col3|NP_286985.1 col31|YP_002271784.1 col4|NP_309246.1 col45|YP_002397150.1 col57|YP_003079099.1 col59|YP_003222735.1 col64|YP_003233659.1
ecoli6372: col125|YP_006312040.1 col127|YP_006770834.1 col131|YP_006779885.1 col134|YP_006785024.1 col3|NP_286990.1 col31|YP_002271776.1 col4|NP_309251.1 col45|YP_002397155.1 col57|YP_003079092.1 col59|YP_003222730.1 col64|YP_003233664.1
I tried converting this file to a binary matrix, following the instructions from here (http://smokeandumami.com/2010/01/21/gene-accumulation-curves-in-r/), but I'm still stuck with how to proceed.
Thanks, I appreciate any help you can give me. Please let me know if I should provide any more information.
Lisa
Sorry for the delay, here's an example of what my binary matrix looks like. I just took a few lines as it's so large.
"ecoli1000" "ecoli1001" "ecoli1002" "ecoli1003" "ecoli1004" "ecoli1005"
"col0" 1 1 0 0 1 0
"col1" 0 1 0 0 0 1
"col2" 0 0 1 0 1 1
"col3" 0 1 0 0 0 0
"col4" 1 0 0 1 1 1
"col5" 1 0 0 1 0 0
Could you show us the binary matrix? I believe it'll be easier to explain it from that.