Question

General questions about running GSEAPreranked

0

Entering edit mode

8.1 years ago

mmccarthy781 ▴ 10

Hey all,

I'm currently learning about GSEA in the hopes of using it in my analysis of differentially expressed genes , and I just had a few questions about the program, specifically about GSEAPreranked, which I need cleared up.

1) On the ranked list needed for GSEA input, should the list include all genes, or only those that pass a certain threshold of significance (i.e. fold change higher than 2, p value less than 0.05, etc.)? Ideally I'd like to sort the genes by fold change alone as I don't trust my p values as much, so should I only include genes with high fold changes?

2) I am comparing multiple conditions of disease with different treatments. Am I correct that GSEA only compares two conditions? If this is the case should I run GSEA for each control/treatment comparison? Would this be conventional?

Thanks for the help!

gsea gseapreranked • 4.3k views

ADD COMMENT • link updated 8.1 years ago by Samuel Brady ▴ 330 • written 8.1 years ago by mmccarthy781 ▴ 10

score 1 · Answer 1 · 2017-07-15

1

Entering edit mode

8.1 years ago

Samuel Brady ▴ 330

Regarding question 1: You should include all genes in GSEA Preranked mode, not just the differentially expressed ones.

Regarding question 2: Yes, GSEA and GSEA Preranked only compare 2 samples or conditions. You have two options. (1) Using GSEA divide your samples into two groups and rank the genes by a metric such as a p-value or average fold change between the two groups. I would not run GSEA for each treatment/control comparison; just run it once using a ranking system that incorporates all replicates in each group. (2) Run ssGSEA within the GSVA package to get signature scores for all of your signatures of interest in each sample. You will then have a signature score x sample matrix instead of a gene x sample matrix, which is a beautiful thing. ssGSEA would be my preference. Example code to run it is here.

ADD COMMENT • link 8.1 years ago by Samuel Brady ▴ 330

0

Entering edit mode

Thanks for the response, it's really informative and is just what I was looking for.

I just have one question about it that I would like clarified. When you say that you would not run GSEA for each treatment/control comparison, do you mean for individual replicates you wouldn't run GSEA? I feel that if I were looking at different drugs, for instance the effect drug X has on a disease vs drug Y when compared to a control with no treatment, I would need to do two comparisons, X vs the control and Y vs the control. Does this make sense?

ADD REPLY • link 8.1 years ago by mmccarthy781 ▴ 10

0

Entering edit mode

Yes, I am saying I would not run GSEA for each replicate pair. So if you have control1, control2, control3 (replicates), and treated1, treated2, treated3, I would not do control1 vs. treated1, control2 vs. treated2, etc. Rather I would rank the genes in a way that all replicates are incorporated into my gene ranks and run GSEA a single time.

Yes, you would do drug X vs. control, drug Y vs. control, etc. if you are trying to figure out what each drug does.

ADD REPLY • link 8.1 years ago by Samuel Brady ▴ 330

0

Entering edit mode

Unluckily the example code link is no longer valid. Do you know where it has moved or do you have a copy of your own?

ADD REPLY • link 7.4 years ago by Michi ▴ 990

0

Entering edit mode

https://github.com/samuelwb/cancer-evolution/blob/master/RnaSeqSignaturesSsGsea/ssGSEA.r

ADD REPLY • link 7.1 years ago by 1769mkc ★ 1.3k