Hi All
I have run the orthomcl for 4 species and having the orthologous pairs file and group file. I am confused that from these files how I can take out the 4 species orthologous sequences as orthologous pairs file have only the paired information and group files contain in one group many sequences.
If some one has run this please let me know.
regards
Than you for your reply . One more thing I wanted to ask that all the proteins which are in same group are othologous to each other. Say I have 6 orthologous proteins in the group such that A1,A2,B1,B2,C1,D1. So can I make my grouping as A1,B1,C1,D1 and A2,B1,C1,D1, A1,B2,C1,D1 and A2,B2,C1,D1 as four orthologous groups ? If it is yes then do I have any script to do it ?
I'm sorry I don't get what you wanted to ask. One line in the group file is one orthologous group.
Sorry I am not able to make you understand .I try once again , since the group file contains the orthologos group in one line and it can be many like you said more then 4 less then 4 and say its is 10,000 groups . But the unique member orthologos group containing only 4 speices is very less (only 1000). So if I just want to make an alignment of 4 species orthologs I just have 1000 genes list. How i can use the the other 9000 groups there i know that i cannot use the groups which are having 3 or 2 species.
But what information do you want to obtain exactly? If you want to know the orthologs between all 4 species, then you can only use those 1000 groups (I believe they're called "core" proteome). If you use all 10,000 groups, I think it's what they call "pan"-proteome among the 4 species. I myself used Orthomcl to spot proteins which are specific or otherwise absent in a bacterial species, compared to other closely related bacteria.