Hello everyone,
I would want to correlate protein expression and mRNA expression in my breast cancer research. I downloaded L4 level RPPA data from the TCPA portal: https://tcpaportal.org/tcpa/download.html, and got a protein expression matrix which is great. However, I was baffled by the protein names from this file. For example, some names look like these: "X1433EPSILON", "EGFR", "EGFR_pY1068", "ERALPHA".
My questions are what these protein names are? Is the first one a legal protein name? What's the difference between the two EGFRs? Which one should I use for correlating with EGFR mRNA expression?
And how should I map them to gene symbols? I believe ERALPHA corresponds to the ESR1 gene. But which R library should I use for mapping this?
This is my first time working with RPPA data, and I didn't find much helpful information from the TCPA portal... Any suggestion is much appreciated!
Hello GenoMax,
I think you're right. Thank you for mentioning this paper! But I am still confused by mapping these names because it seems the mapping from Protein name to Gene Name is not one-to-one, but many-to-many.
To be more specific, one example is as follows.
Protein Name Gene Name
Akt_pS473 AKT1 AKT2 AKT3
Akt_pT308 AKT1 AKT2 AKT3
Akt AKT1 AKT2 AKT3
These three proteins are mapped to the three same gene names. I wonder, in this scenario, how would you map them?
Thank you very much!