Hey all,
I'm currently learning about GSEA in the hopes of using it in my analysis of differentially expressed genes , and I just had a few questions about the program, specifically about GSEAPreranked, which I need cleared up.
1) On the ranked list needed for GSEA input, should the list include all genes, or only those that pass a certain threshold of significance (i.e. fold change higher than 2, p value less than 0.05, etc.)? Ideally I'd like to sort the genes by fold change alone as I don't trust my p values as much, so should I only include genes with high fold changes?
2) I am comparing multiple conditions of disease with different treatments. Am I correct that GSEA only compares two conditions? If this is the case should I run GSEA for each control/treatment comparison? Would this be conventional?
Thanks for the help!
Thanks for the response, it's really informative and is just what I was looking for.
I just have one question about it that I would like clarified. When you say that you would not run GSEA for each treatment/control comparison, do you mean for individual replicates you wouldn't run GSEA? I feel that if I were looking at different drugs, for instance the effect drug X has on a disease vs drug Y when compared to a control with no treatment, I would need to do two comparisons, X vs the control and Y vs the control. Does this make sense?
Yes, I am saying I would not run GSEA for each replicate pair. So if you have control1, control2, control3 (replicates), and treated1, treated2, treated3, I would not do control1 vs. treated1, control2 vs. treated2, etc. Rather I would rank the genes in a way that all replicates are incorporated into my gene ranks and run GSEA a single time.
Yes, you would do drug X vs. control, drug Y vs. control, etc. if you are trying to figure out what each drug does.
Unluckily the example code link is no longer valid. Do you know where it has moved or do you have a copy of your own?
https://github.com/samuelwb/cancer-evolution/blob/master/RnaSeqSignaturesSsGsea/ssGSEA.r