Question

GSEA analysis error

0

Entering edit mode

5.0 years ago

Marco Pannone ▴ 810

Hello!

I am trying to run GSEA on my RNA-seq dataset using the tool provided by the Broad Institute, which I have downloaded from their webpage. I am using as input files my expression dataset (including all the genes, not only DEGs) Gene Symbols as identifiers, followed by the normalized counts for the samples. In addition to the expression dataset, I have generated a phenotype label .csl file as required by the tool.

In the "gene set database" I have selected the databases from h to c6 including only the ones with the ".all" definition, in order to avoid duplications. Also, in the "permutation type" I have selected "gene_set".

When I try to run the GSEA analysis, I am uncertain what to select in the "Collapse" option. If I select "No_Collapse", then I get the following error message:

After pruning, none of the gene sets passed size thresholds.

If I instead select "Collapse", it requires me to select a "ChIP platform" and I am very confused about what to select. Using Gene Symbols as identifiers in my expression dataset, I have tried to select "Human_Symbol_with_Remapping_MSigDB.v7.1.chip", but I get the following error:

The collapsed dataset was empty when used with chip:ftp.broadinstitute.org://pub...

Any help would be very appreciated!

Thanks!

gsea RNA-Seq • 4.9k views

ADD COMMENT • link updated 5.0 years ago by Danielle B ▴ 10 • written 5.0 years ago by Marco Pannone ▴ 810

0

Entering edit mode

followed by the normalized counts for the samples

GSEA requires a ranked list, e.g. ranked by singificance. How did you generate this?

ADD REPLY • link 5.0 years ago by ATpoint 88k

0

Entering edit mode

I am pretty new to this, so I am not sure to understand exactly what you mean. I have uploaded my expression dataset where I have one column with all the gene symbols and other columns with the normalized counts for each sample.

ADD REPLY • link 5.0 years ago by Marco Pannone ▴ 810

0

Entering edit mode

This just isn't true. Providing a ranked list to GSEA Preranked is one way to run GSEA, but providing normalized counts to the standard GSEA UI is actually the defaut way to run it.

Also ranking by just significance is really not a great way to rank genes, you can use something like -log10(pValue)*sign(log2(FC)) though.

One of the bigger problems with Preranked mode is that for datasets where you have a large enough N you loose the benefit of phenotype permutation testing since Preranked only allows gene_set permutation testing for false discovery.

-Anthony

Anthony S. Castanza, PhD

Curator, Molecular Signatures Database

GSEA-MSigDB Team

ADD REPLY • link 4.8 years ago by Anthony Castanza ▴ 10

score 0 · Answer 1 · 2020-06-07

Hi Marco, I ran into the same issues when I was doing this a few weeks ago too. In my .gct file, I ended up putting my gene EntrezIDs in my first column "Name", and then put the corresponding gene names in the "Description" column. This then allowed me to collapse my dataset in GSEA (even though I really didn't need to). I selected the Human_NCBI_Entrez_Gene_ID_MSigDB... option from the drop-down, since that was the best match to the information I put in the first "Name" column. I only selected one Gene Sets Database at a time, though not sure how relevant that is.

Also, in response to @ATpoints comment, you don't need to use a ranked gene list as your input. In fact, I think GSEA prefers to do the rankings itself, to me, that's part of it's magic! Let me know if this ends up helping!

Best, Danielle