You do DESeq on your Dseq Data Set (dds) and once you get the results you can do this to remove NA.

Question

Gene Set Enrichment Analysis after DESeq2

12

Entering edit mode

7.1 years ago

Sreeraj Thamban ▴ 300

Hello Biostars, Can anyone tell me how to prepare input data set for GSEA after Differential Gene Expression Analysis by DESeq2? How will I rank the genes? Should I rank based on log2FC or Adjusted P value? Is there any way to generate a GSEA ready data directly from DESeq2?. I was using topGo for gene ontology enrichment analysis before and recently came across GSEA. Which one is better GO enrichment analysis or GSEA? Even after going through the papers I couldn't find a significant difference between above two.

Thank you

RNA-Seq DESeq2 geneontology GSEA • 29k views

ADD COMMENT • link updated 8 months ago by CTLong ▴ 120 • written 7.1 years ago by Sreeraj Thamban ▴ 300

3

Entering edit mode

I like DESeq2. It would be great to have in the future something like ROAST/CAMERA/GSEA in DESeq2 too!

ADD REPLY • link 6.1 years ago by enxxx23 ▴ 280

0

Entering edit mode

HI Sreeraj,

I don't know what is your model organism. For humans, mouse, drosophila and similar stuff, I guess it's easy because you can use online available databases and ensemble annotations. I participated in one online course about RNAseq data analysis on HUMAN data so I can share what I learned if it's helpful for you. It's just that I still didn't try that on my own data but here's what I know.

For GSEA - Initially you install these stuff in R:

install.packages("BiocManager") BiocManager::install(version = "3.16") BiocManager::install("DESeq2") BiocManager::install("clusterProfiler") BiocManager::install("org.Hs.eg.db") --> this is an organism-specific annotation package, this one is for humans but for instance, you can maybe find some others here: http://geneontology.org/ OR you can make your own dataset if you are working with nonmodel. I'm not an expert and I am still learning but its DOABLE so here you can see a similar question from my side, maybe it will help you: GSEA on nonmodel organisms

You do DESeq on your Dseq Data Set (dds) and once you get the results you can do this to remove NA.

dds_results_filtered <-dds_results[complete.cases(dds_results),]

I think you should use p-adjusted values in your filtering because that is representing SIGNIFICANT differences.

Then you can make a data set just for significantly upregulated genes like this:

upreg <- rownames(dds_results_filtered)[dds_results_filtered$pvalue < 0.05 & dds_results_filtered$log2FoldChange > 0]

Then you load your libraries:

library(clusterProfiler) library(org.Hs.eg.db)

Then you do GSEA:

gsea <- enrichGO(upreg, OrgDb = org.Hs.eg.db, keyType = "ENSEMBL", ont = "BP", universe = rownames(dds_results_filtered))

than you can make a simplified view

gsea <- simplify(gsea)

extract the data from gsea in nice table, first terms listed are the most significant

gsea_df <- as.data.frame(gsea)

additionally for excel you can try this

write.table(gsea_df, file = "gsea.tsv", sep = "\t")

and finally to see a nice dot plot for example for the top 13 categories:

dotplot(gsea, showCategory =13)

And then you can repeat for downregulated.

Hope this helps.

Lada

ADD REPLY • link 22 months ago by Lada ▴ 30

0

Entering edit mode

Just a comment, this is not really a gene set enrichment analysis. Rather an over-representation test.

ADD REPLY • link 8 months ago by CTLong ▴ 120

0

Entering edit mode

22 months ago

Oliver ▴ 10

Another option to gene ranking is to use the "stat"-output, that is generated by DESeq2, since that takes the logFold-change, as well as the standard error into account.

Check this video to see how to directly use the DESeq2 output for GSEA: Video tutorial

ADD COMMENT • link 22 months ago by Oliver ▴ 10

score 14 · Accepted Answer · 2017-10-21

14

Entering edit mode

7.1 years ago

Prakash ★ 2.2k

Hi Sreeraj

Genes can be ranked based on fold change and P value and that can be used in GSEA package.

you can use this R code for this purpose.

x <- read.table("DE_genes.txt",sep = "\t",header = T)
head(x)
x$fcsign <- sign(x$log2.fold_change.)
x$logP=-log10(x$p_value)
x$metric= x$logP/x$fcsign
y<-x[,c("Gene", "metric")]
head(y)
write.table(y,file="DE_genes.rnk",quote=F,sep="\t",row.names=F)

ADD COMMENT • link 7.1 years ago by Prakash ★ 2.2k

3

Entering edit mode

in this case, what parameter should we input into GSEA?

ADD REPLY • link 6.9 years ago by langya ▴ 120

0

Entering edit mode

How would you handle NA values?

ADD REPLY • link 6.0 years ago by t-jim ▴ 30

1

Entering edit mode

.....

filtered <- na.omit(y)

write.table(filtered ,file="DE_genes.rnk",quote=F,sep="\t",row.names=F)

ADD REPLY • link 5.4 years ago by Barry Digby ★ 1.3k

0

Entering edit mode

I have used this code but am struggling to obtain a table where the gene names are appearing as names and not numbers, for some reason it keeps saving a table with the ranks but no gene names.

ADD REPLY • link 5.4 years ago by g.birch15 • 0

1

Entering edit mode

Try row.names = TRUE.

Also, you want to use col.names = FALSE, as GSEA complains when 'Gene' and 'metric' are in the rnk. file.

ADD REPLY • link 5.4 years ago by Barry Digby ★ 1.3k

score 7 · Accepted Answer · 2017-10-21

7

Entering edit mode

7.1 years ago

Michael Love ★ 2.6k

Here's a link to an answer I wrote a few years ago for using the gene set testing package goseq following DESeq2:

https://support.bioconductor.org/p/64811/#64815

I'm not sure what kind of input GSEA takes. I also like the methods behind ROAST and CAMERA from the limma package, but I haven't yet worked on integrating with those methods. For those two, you would need to run a limma analysis upstream.

ADD COMMENT • link 7.1 years ago by Michael Love ★ 2.6k

1

Entering edit mode

Hey I noticed in the newest DESeq2 version, the default setting of fold change is not shrunken fold change, may I ask why? i thought the shrunken fold change gives you higher confidence.

ADD REPLY • link 6.7 years ago by langya ▴ 120

0

Entering edit mode

You can generate these via lfcShrink()

ADD REPLY • link 4.2 years ago by Kevin Blighe 88k