I'm looking at the gene expression value of the gene PFKL from GSE20966. The following is the data from one sample reported in the study PFKL PFKL PFKL PFKL GSM524151 58.27 17.71 402.40 439.61
As we see, the magnitude of the expression values are quite different. Also, the probe id associated with value reported in each column is different.
In the paper published from this study, a single value is reported for the gene,PFKL.
I'm confused on this aspect. Do we consider the probe set that shows high expression? Or should we compute an average?
Could someone explain?
In my opinion the multiple probes could be targeting exon that contribute to different transcripts isoforms. The final value of the gene they mention is it average of all the probe values or the highest among them? In case you don't care about transcript isoforms, you could follow their logic of finding gene expression for all genes for comparison. If you want to know the expression of transcripts, you could dig the probe platform they use, and they might have information what probe refers to what transcripts.
Could you please explain a bit more on what you mean by transcript isoform? PFK has 3 isoforms , PFKL ,PFKM,PFKP. THe example that I mentioned above is PFKL. There were four probe ids just for PFKL . This leads to the confusion. As you pointed out, I agree different transcripts are linked to different probes. In the data file, I clearly see the distinction i.e. PFKM and PFKP are reported separately.
Microarray: How To Select One Of Multiple Probes Corresponding To A Gene Microarray Expression For Genes With Multiple Probes https://support.bioconductor.org/p/92128/
Please refer to the following post to gain more information. I think different people might suggest different way, all equally correct/incorrect. Just be consistent, that should be the rule.
In response to your previous post,
I could find 4 values from the same trail, For GSM524151,
Is there any discrepancy in what I observe? Could you please let me know?
I'll definitely read through the posts in the link that you just shared
It depends what is you end goal? Comparing gene expression among various condition of all the genes or just the gene of your focus? If you want to focus on all genes, just follow the procedure to merge different probe information into one. You could chose media, mean or just the highest. Like I said follow the same rule. If you want to focus on one transcript. Focus on the transcript isoform that you are interested in (as shown below). While different isoform have different transcripts, they could end up coding the same protein or same functional protein. So, the difference in isoforms could be just the utr, which is related to regulation or could be difference in domain that don't have a evolutionary conserved domain like homeobox, zinc finger and so on. Just to summarize, it depends on the end goal.
Thanks a lot for the advise. My end goal is to compare the expression level of different isoforms(not at the transcript level) of all the genes. I will stick to the suggestions received and follow this link , consider the highest . There were a few answers, in the posts of the link shared by you, that also suggested to consider the second highest.