Dear colleagues,
I have some microarray datasets with distinct diseases in each of them and I'd like to generate a score to indicate activity of a given pathway in a way comparable across datasets. Most of them are the same Affymetrix platform, but not all.
I started by reanalyzing each from scratch and used GCRMA to background correct and normalize (quantile).
So far, I have tried:
Calculating a molecular distance to health score, using a normal/control/baseline present within each dataset, but the results are suspicious because irrespective of the gene set, there are huge discrepancies among diseases. This method is described in Pankla et al. 2009
Calculating GSVA and ssGSEA scores separetely in each dataset, but I'm not sure If I can compare those scores across the datasets. If not, should I take the ratios between case and control of scores within datasets and compare between datasets?
Unfortunately, I can not merge the datasets, adjust for dataset effects, and perform differential expression directly because they do not contain the same group of samples in each.
Thanks.
Have you tried the gene set test functions in limma? (https://rdrr.io/bioc/limma/man/geneSetTest.html) This allows you to define and score arbitrary gene sets. Thus if you have various pathways, it's an easy way to generate a score for each pathway in each data set, and maybe you can make sense of a matrix of scores that would contain your pathway of interest as well as other pathways that you can use as "controls", or even random selections of gene sets to establish some sense of variance. (I guess this is similar to what you've already tried in 2). Might not be statistically robust, but you could probably generate a heat map to see if indeed your pathway shows "activity" distinct from others.
Thanks for the suggestion. The score you refer as output of the
geneSetTest
function in limma is a P-value, right?