I'm new to BioInformatics and I am working with "benchmark" cancer dataset. I need to retrieve information from GO database, so I'm trying to convert my dataset probe IDs to Uniprot IDs (I'm using MadGene tool). But I have two problems:
1) Some probes are mapped with multiple Uniprot IDs (I.e U70732rna1at -> Q7Z4T9 and P24298). How should I manage this situation? Which Uniprot ID do I have to select for my probe?
2) Sometime different probes are mapped with the same Uniprot IDs? What is the best method to resolve this problem? Should I average the gene expression values of such probes?
For the first question, may be you can take a lood at Uniprot IDs themselves. Using your example U70732rna1at -> Q7Z4T9 and P24298, if the first one (Q7Z4T9) is a swissprot entry and second one (P24298) is a Trembl entry, I would say pick the first one.
What I do is, select any Id for that and treat every ID are synonymous. When ever I download any data, first I will format the data according to the IDs I have. So that it will never be a problem what Id you get(any of the multiple IDs). I know this might not be the appropriate way but. I'vnt got any other solutions in my mind.
So you basically duplicate the data about that probe as many time as the number of "equivalent" IDs you get, isn't it?
no there is no duplication involved here. Just treat all the multilpe Ids mapped to Id as one.
Can you please make an example?
Plz check the answer below