In this post, I learned that there are R implementations of the GSEA algorithm which rely on the permutation of class labels. I wanted to try out the method which is described in this paper and which is available as the GSA
package. If I understand it correctly, it first computes a t-statistic for each gene based on the input to the GSA
function.
However, I am unsure what kind of data the GSA
function expects as the x
argument. The documentation says the following:
Data x: p by n matrix of features (expression values), one observation per column (missing values allowed); y: n-vector of outcome measurements
My intuition tells me that it would be wrong to input the raw data (i.e. the counts) and that I should instead use transformed and normalized counts.
So ultimately, my questions is, if it is correct to use the vst
transformed count data as the input to the GSA
function.
Any pointers are much appreciated!
Cross-posted https://support.bioconductor.org/p/9148617/