Hello!
I am trying to run GSEA on my RNA-seq dataset using the tool provided by the Broad Institute, which I have downloaded from their webpage. I am using as input files my expression dataset (including all the genes, not only DEGs) Gene Symbols as identifiers, followed by the normalized counts for the samples. In addition to the expression dataset, I have generated a phenotype label .csl file as required by the tool.
In the "gene set database" I have selected the databases from h to c6 including only the ones with the ".all" definition, in order to avoid duplications. Also, in the "permutation type" I have selected "gene_set".
When I try to run the GSEA analysis, I am uncertain what to select in the "Collapse" option. If I select "No_Collapse", then I get the following error message:
After pruning, none of the gene sets passed size thresholds.
If I instead select "Collapse", it requires me to select a "ChIP platform" and I am very confused about what to select. Using Gene Symbols as identifiers in my expression dataset, I have tried to select "Human_Symbol_with_Remapping_MSigDB.v7.1.chip", but I get the following error:
The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub...
Any help would be very appreciated!
Thanks!
followed by the normalized counts for the samples
GSEA requires a ranked list, e.g. ranked by singificance. How did you generate this?
I am pretty new to this, so I am not sure to understand exactly what you mean. I have uploaded my expression dataset where I have one column with all the gene symbols and other columns with the normalized counts for each sample.
This just isn't true. Providing a ranked list to GSEA Preranked is one way to run GSEA, but providing normalized counts to the standard GSEA UI is actually the defaut way to run it.
Also ranking by just significance is really not a great way to rank genes, you can use something like -log10(pValue)*sign(log2(FC)) though.
One of the bigger problems with Preranked mode is that for datasets where you have a large enough N you loose the benefit of phenotype permutation testing since Preranked only allows gene_set permutation testing for false discovery.
-Anthony
Anthony S. Castanza, PhD
Curator, Molecular Signatures Database
GSEA-MSigDB Team