[Please accept my apologies if this is already posted here, as I cannot seem to locate.]
I have a set of mouse protein IDs (ensembl)+ amino acid positions, and I would like to know the corresponding amino acid position in the human ortholog.
I was wondering if there is any standard resource, or programmatic approach I can retrieve this information.
What I have tried so far:
UCSC table browser to retrieve multiz alignments in CDS fasta format. The results do not seem very helpful as amino acid numbering doesn't seem to be trivial form the output.
BLASTP from NCBI portal. While this likely has all the necessary information, I am not sure how to automate this. I am planning on looking into biopython to handle this.
I am aware that different protein isoforms need to be considered here as well.
Any pointers/ideas will be appreciated.
You can find pre-computed information about human-mouse homology at Mouse Genome Informatics (MGI) resource hosted at Jackson Labs.
(Note: At the time of this writing the link that has this information is producing a ftp error. You could contact Jackson tech support if you need the information right away, otherwise wait to a day or two and it should resolve itself).
Thank you for your input.
I had checked out the resource you mentioned. While the link you posted is down, I could access the file here. Unfortunately, I do not think it contains the information I am looking for; i.e. not just the protein id mappings, but the amino acid position mappings.
I doubt you are going to find that level of detail in a pre-computed form. Only other resource I can think of is NCBI Homologene. You should be able to MSA of human/mouse proteins there.
Hi, I'm also trying to do this and am wondering how you solved this problem? Thanks!
As far as I can see, 3 years later there it still no tool (or more like resource) that would take a mouse amino acid position and give an equivalent amino acid position in the human homologue. Am I missing something? Is this a lot more difficult than I think? Or is there just not enough use for it for people to bother making something like that? Isn't it a matter of running tons of multiple sequence alignments and packaging the results into a database? Or does something like that already exist and my Google powers are failing me?