Question

Gene Set Enrichment Analysis after DESeq2

12

Entering edit mode

7.5 years ago

Sreeraj Thamban ▴ 310

Hello Biostars, Can anyone tell me how to prepare input data set for GSEA after Differential Gene Expression Analysis by DESeq2? How will I rank the genes? Should I rank based on log2FC or Adjusted P value? Is there any way to generate a GSEA ready data directly from DESeq2?. I was using topGo for gene ontology enrichment analysis before and recently came across GSEA. Which one is better GO enrichment analysis or GSEA? Even after going through the papers I couldn't find a significant difference between above two.

Thank you

RNA-Seq DESeq2 geneontology GSEA • 31k views

ADD COMMENT • link updated 3 months ago by Ram 45k • written 7.5 years ago by Sreeraj Thamban ▴ 310

3

Entering edit mode

I like DESeq2. It would be great to have in the future something like ROAST/CAMERA/GSEA in DESeq2 too!

ADD REPLY • link 6.5 years ago by enxxx23 ▴ 300

0

Entering edit mode

HI Sreeraj,

I don't know what is your model organism. For humans, mouse, drosophila and similar stuff, I guess it's easy because you can use online available databases and ensemble annotations. I participated in one online course about RNAseq data analysis on HUMAN data so I can share what I learned if it's helpful for you. It's just that I still didn't try that on my own data but here's what I know.

For GSEA - Initially you install these stuff in R:

install.packages("BiocManager")
BiocManager::install(version = "3.16")
BiocManager::install("DESeq2")
BiocManager::install("clusterProfiler")
BiocManager::install("org.Hs.eg.db") ##this is an organism-specific annotation package, this one is for humans but for instance, you can maybe find some others here: http://geneontology.org/ OR you can make your own dataset if you are working with nonmodel. I'm not an expert and I am still learning but its DOABLE so here you can see a similar question from my side, maybe it will help you:
https://www.biostars.org/p/9552469/#9552700  


#You do DESeq on your  Dseq Data Set (dds) and once you get the results you can do this to remove NA. 
dds_results_filtered <-dds_results[complete.cases(dds_results),]

## I think you should use p-adjusted values in your filtering because that is representing SIGNIFICANT differences. 
#Then you can make a data set just for significantly upregulated genes like this: 
upreg <- rownames(dds_results_filtered)[dds_results_filtered$pvalue < 0.05 & dds_results_filtered$log2FoldChange > 0] 

#Then you load your libraries: 
library(clusterProfiler)
library(org.Hs.eg.db)

#Then you do GSEA: 
gsea <- enrichGO(upreg, OrgDb = org.Hs.eg.db, keyType = "ENSEMBL",
                 ont = "BP", universe = rownames(dds_results_filtered))

#than you can make a simplified view
gsea <- simplify(gsea)

#extract the data from gsea in nice table, first terms listed are the most significant
gsea_df <- as.data.frame(gsea)

#additionally for excel you can try this
write.table(gsea_df, file = "gsea.tsv", sep = "\t")

#and finally to see a nice dot plot for example for the top 13 categories: 
dotplot(gsea, showCategory =13)

And then you can repeat for downregulated.

Hope this helps.

Lada

ADD REPLY • link updated 3 months ago by Ram 45k • written 2.2 years ago by Lada ▴ 40

0

Entering edit mode

Just a comment, this is not really a gene set enrichment analysis. Rather an over-representation test.

ADD REPLY • link 14 months ago by CTLong ▴ 140

0

Entering edit mode

2.3 years ago

Oliver ▴ 10

Another option to gene ranking is to use the "stat"-output, that is generated by DESeq2, since that takes the logFold-change, as well as the standard error into account.

Check this video to see how to directly use the DESeq2 output for GSEA: Video tutorial

ADD COMMENT • link 2.3 years ago by Oliver ▴ 10

score 14 · Accepted Answer · 2017-10-21

14

Entering edit mode

7.5 years ago

Prakash ★ 2.2k

Hi Sreeraj

Genes can be ranked based on fold change and P value and that can be used in GSEA package.

you can use this R code for this purpose.

x <- read.table("DE_genes.txt",sep = "\t",header = T)
head(x)
x$fcsign <- sign(x$log2.fold_change.)
x$logP=-log10(x$p_value)
x$metric= x$logP/x$fcsign
y<-x[,c("Gene", "metric")]
head(y)
write.table(y,file="DE_genes.rnk",quote=F,sep="\t",row.names=F)

ADD COMMENT • link 7.5 years ago by Prakash ★ 2.2k

3

Entering edit mode

in this case, what parameter should we input into GSEA?

ADD REPLY • link 7.3 years ago by langya ▴ 130

0

Entering edit mode

How would you handle NA values?

ADD REPLY • link 6.4 years ago by t-jim ▴ 30

1

Entering edit mode

.....

filtered <- na.omit(y)

write.table(filtered ,file="DE_genes.rnk",quote=F,sep="\t",row.names=F)

ADD REPLY • link 5.8 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

I have used this code but am struggling to obtain a table where the gene names are appearing as names and not numbers, for some reason it keeps saving a table with the ranks but no gene names.

ADD REPLY • link 5.8 years ago by g.birch15 • 0

1

Entering edit mode

Try row.names = TRUE.

Also, you want to use col.names = FALSE, as GSEA complains when 'Gene' and 'metric' are in the rnk. file.

ADD REPLY • link 5.8 years ago by Barry Digby ★ 1.3k

score 7 · Accepted Answer · 2017-10-21

7

Entering edit mode

7.5 years ago

Michael Love ★ 2.6k

Here's a link to an answer I wrote a few years ago for using the gene set testing package goseq following DESeq2:

https://support.bioconductor.org/p/64811/#64815

I'm not sure what kind of input GSEA takes. I also like the methods behind ROAST and CAMERA from the limma package, but I haven't yet worked on integrating with those methods. For those two, you would need to run a limma analysis upstream.

ADD COMMENT • link 7.5 years ago by Michael Love ★ 2.6k

1

Entering edit mode

Hey I noticed in the newest DESeq2 version, the default setting of fold change is not shrunken fold change, may I ask why? i thought the shrunken fold change gives you higher confidence.

ADD REPLY • link 7.1 years ago by langya ▴ 130

0

Entering edit mode

You can generate these via lfcShrink()

ADD REPLY • link 4.6 years ago by Kevin Blighe 89k