I am helping prepare data downloaded from ArrayExpress to be used in a machine learning algorithm.
The algorithm expects one sequence of gene expression values for each UNIPROT. However, the data is organized (as usual I assume, but I am new to this area) so that there is one sequence of gene expressions for each Probe ID.
I did a look up that associates the Probe IDs with UNIPROTs, but it is a many-many relation. So each UNIPROT has multiple associated Probe IDs.
Is there some "standard" or "proper" way to combine the gene expression values for each UNIPROT? Of course, I could take the mean or median gene expression value over all Probe IDs associated with each UNIPROT, but is there a better way?