Hi,
I have been doing evolutionary genomics with a few newly sequenced vertebrate species; they have RefSeq annotations supported by RNASeq data, but they are not model species or really common, and some are really remote in evolutionary terms (like the Elephant Shark, the Lamprey). I would like to get the Interpro domains of their proteins and find orthologous genes if possible.
The various papers in which the authors published have all done the same kind of analysis, using OrthoMCL/ Blast2GO /InterPro and others to get that kind of information as part of their annotation process. The problem is that very few bothered to publish their complete data, such as which particular orthologous genes they retained for substitution rate calculation. All I have are methods and results (which is nice but not enough).
Are there any database tracking all the proteins from a given genome, giving their gene family, GO terms and orthologous genes in other species?
I could also rerun some InterPro scan/OrthoMCL jobs (lengthy option) or try to use some of the reference organisms to which they were aligned as a common denominator (quicker option, but I will lose possible orthologous genes). I can also contact the authors, but I would still like to know if such a ressource exists.
Thanks!
Did you check if the data you want is already on some public database (OrthoDB springs to my mind)?
Contacting the authors and asking (nicely) for the data is also a good idea. Some authors will help you, some won't, but at least you will get something, and maybe some nice exchange of comments / suggestions / ideas.
Very nice link. I had checked out Panther and various others, but missed OrthoDB. It gives me a few more species, like the sea lamprey, which is cool. I can grab a few orthologues (with possibly several copies) between 3-4 species of interest, determine their family and GO, and use that on my remaining species. I think the missing species were sequenced 1 year ago maybe, so this may explain their absence.
Thanks a lot!