Question

DESeq2 followed by GSEA

4

Entering edit mode

6.9 years ago

langya ▴ 120

I recently analyzed my RNA-Seq data followed by STAR-HTSeqCount-DESeq2 method and want to run these on GSEA to find correlation with certain pathways. But for the input, should I prerank the genes based on Log2FC or log(p-value) * sign_of_FC. If use the later case, how should I choose the parameter of GSEA? And in all these ranking process, should i use 0.01 as cutoff for p-adj (FDR) to cut off other genes.

RNA-Seq gene R GSEA next-gen • 7.3k views

ADD COMMENT • link updated 6.9 years ago by Ram ▴ 190 • written 6.9 years ago by langya ▴ 120

score 1 · Answer 1 · 2017-12-30

1

Entering edit mode

6.9 years ago

Ram ▴ 190

I think there are two ways of generating the GSEA preranked list: 1) by log2FC; 2) by p value.

You might take a look at this previous thread:

Gene Set Enrichment Analysis after DESeq2

ADD COMMENT • link 6.9 years ago by Ram ▴ 190

0

Entering edit mode

THANKS! But I didnt understand about the p-adj part. Shouldnt i use any cutoff for p-adj to filter out some genes?

ADD REPLY • link 6.9 years ago by langya ▴ 120

0

Entering edit mode

Below mentioned is the description from DESeq2 Vignette for p-values in case for p-adj :

 By default, independent filtering is performed to select a set of
 genes for multiple test correction which will optimize the number
 of adjusted p-values less than a given critical value ‘alpha’ (by
 default 0.1). The adjusted p-values for the genes which do not
 pass the filter threshold are set to ‘NA’. By default, the mean of
 normalized counts is used to perform this filtering, though other
 statistics can be provided. Several arguments from the
 ‘filtered_p’ function of genefilter are provided here to control
 or turn off the independent filtering behavior.

 By default, ‘results’ assigns a p-value of ‘NA’ to genes
 containing count outliers, as identified using Cook's distance.
 See the ‘cooksCutoff’ argument for control of this behavior.
 Cook's distances for each sample are accessible as a matrix
 "cooks" stored in the ‘assays()’ list. This measure is useful for
 identifying rows where the observed counts might not fit to a
 Negative Binomial distribution.

ADD REPLY • link 6.9 years ago by Ram ▴ 190

0

Entering edit mode

If use basemean to filter out genes, do you know what cutoff i should use to filter out? Also, my baseMean between two biologcial replicates are very high but not the log2foldchange or p-val. So should I use baseMean for GSEA analysis? Thanks!

ADD REPLY • link 6.9 years ago by langya ▴ 120