Gene set enrichment analysis with edgeR output
2
2
Entering edit mode
6.2 years ago
Biologist ▴ 290

Hi,

I have a Differential expressed genes with edgeR in dataframe "er"

X   logFC   logCPM  F   PValue  FDR
LYVE1   -2.286965927    3.622301526 15.20371163 9.69E-05    0.013364458
LANCL2  -2.211422717    5.037579176 41.62021204 1.17E-10    2.65E-07
MRPL37  -2.100712376    7.077497716 33.64633822 6.82E-08    6.53E-05
ARHGDIB -2.075307337    2.263760868 12.39278702 0.000432777 0.034317014
SNORA57 -1.976169985    2.843788903 14.92295881 0.000112552 0.014418635
C14orf105   -1.859401694    3.004499365 13.02450243 0.000308388 0.028440396
ATP2B2  1.922575077 3.836340714 21.03069806 4.55E-06    0.001679724
ACSM3   1.915862102 3.268232386 14.04761306 0.000178872 0.019977015
UNC93A  1.914557961 2.779896967 12.53269014 0.000401039 0.033134889
TFR2    1.897779698 2.734046964 11.91293895 0.000558906 0.039911789
ABCA2   1.875311985 4.024932749 22.0241006  2.71E-06    0.0011637
NTS 1.859274971 6.165644757 17.75363345 2.53E-05    0.005418355
DLK1    1.849908433 6.965559508 18.95639802 1.35E-05    0.003334802
FAM171A2    1.845821788 2.878346297 14.4304531  0.000145974 0.017022099
SLC26A1 1.817559593 2.926690164 13.39813378 0.000253075 0.024793914

Using the above table based on logFC and PValue I ranked the genes.

er$fcsign <- sign(er$logFC)
er$logP=-log10(er$PValue)
er$metric= er$logP/er$fcsign
final<-er[,c("X", "metric")]
head(final)

X   metric
LYVE1   -4.0136762
LANCL2  -9.9318141
MRPL37  -7.1662156
ARHGDIB -3.3637358
SNORA57 -3.9486468
C14orf105   -3.5109025
ATP2B2  5.341988603
ACSM3   3.747457637
UNC93A  3.396813391
TFR2    3.252661228
ABCA2   5.567030709
NTS 4.596879479
DLK1    4.869666232
FAM171A2    3.835724491
SLC26A1 3.596750754

Now with the available hallmark pathways file h.all.v6.2.symbols.gmt I would like to run GSEA on the list and want the whole pathways output. Not only significant pathways.

can anyone tell how to do that.

RNA-Seq gsea edger genesetenrichment R • 8.1k views
ADD COMMENT
2
Entering edit mode
6.2 years ago
thomaskuilman ▴ 850

Although I prefer a 100% R way of doing this, I am still export using the GSEA Java Desktop Application. You could export your data as follows:

## Remove entries without gene symbol and with NAs
er <- er[er$X != "", ]
er <- er[complete.cases(er), ]

## Write to a .rnk file
write.table("#Differential expression", file = file.path(DIRPREFIX, "GSEA.rnk"),
            sep = "\t", row.names = FALSE, col.names = FALSE, quote = FALSE)
write.table(er, file = file.path(DIRPREFIX, "GSEA.rnk"), sep = "\t",
            row.names = FALSE, col.names = FALSE, quote = FALSE, append = TRUE)

By the way, plotting in R can be done using a package I wrote; see the following post.

ADD COMMENT
0
Entering edit mode

Thank very much for the answer. But I have an Error.

As you said I removed NAs and saved the file as .rnk

er <- er[er$X != "", ]
er <- er[complete.cases(er), ]
write.table(er,file="GSEA.rnk",quote=F,sep="\t",row.names=F)

I downloaded gsea-3.0.jar

On Command line I gave java -jar gsea-3.0.jar

I see the GSEA software and browsed the GSEA.rnk file and selected the "GSEA Preranked" from the "Tools" pull-down menu and gave RUN.

And I see the GSEA reports status as - Error!

Here is the ERROR I saw on command line

MacBook-Pro:home ras$ java -jar gsea-3.0.jar 
3129 [INFO ] Begun importing: RankedList from: /home/GSEA.rnk
3164 [INFO ] Loaded file: /home/GSEA.rnk
3166 [INFO ] Loading ... 1 files

GSEA.rnk

Files loaded successfully: 1 / 1
There were NO errors
7655 [INFO ] File download started.  Retrieving h.all.v6.2.symbols.gmt from remote server...
8129 [INFO ] Download complete
8326 [INFO ] Begun importing: GeneSetMatrix from: h.all.v6.2.symbols.gmt
8353 [INFO ] Got gsets: 50 now preprocessing them ... min: 15 max: 500
8375 [INFO ] Done preproc for smaller than: 15
35248 [ERROR] Tool exec error   at edu.mit.broad.xbench.tui.TaskManager.run(TaskManager.java:447)
xtools.api.param.BadParamException: After pruning, none of the gene sets passed size thresholds.
    at xtools.api.param.ParamFactory.checkAndBarfIfZeroSets(ParamFactory.java:88)
    at xtools.gsea.GseaPreranked.execute(GseaPreranked.java:95)
    at edu.mit.broad.xbench.tui.TaskManager$ToolRunnable.run(TaskManager.java:436)
    at java.lang.Thread.run(Thread.java:745)
8420 [INFO ] Renaming rpt dir on error to: /Users/ras/gsea_home/output/sep26/error_my_analysis.GseaPreranked.1537953606909

It says After pruning, none of the gene sets passed size thresholds

What could be the reason?

It again works with Gene Ontology genesets. It gave status Success

ADD REPLY
0
Entering edit mode

The tool applies filtering to the gene sets to get gene set with sizes within a certain range (you can set these using the 'Max size: exclude larger sets' and 'Min size: exclude smaller sets' in the 'basic fields' settings). I presume none of your gene sets made it through those filters. You may want to change the settings as per the above, or using different gene sets. This link explains things more extensively.

ADD REPLY
0
Entering edit mode

Thank you. Could you please tell me where I can find this 'basic fields' in the GSEA software? I don't see any settings option here.

ADD REPLY
0
Entering edit mode

You can find it in the 'Run Gsea' tab after pressing the 'Run Gsea' icon in the sidebar.

ADD REPLY
0
Entering edit mode

Oh ya. I see the show button. thank you !!

ADD REPLY
2
Entering edit mode
6.2 years ago
h.mon 35k

As you are using edgeR, you might want to consider CAMERA for gene-set enrichment testing, as it is part of edgeR. Check edgeR Users Guide, or the tutorials RNA-seq analysis is easy as 1-2-3 with limma, Glimma and edgeR and / or RNA-seq analysis in R.

ADD COMMENT
0
Entering edit mode

Thank you for the answer. But I followed this post now [https://stephenturner.github.io/deseq-to-fgsea/]

library(tidyverse)
library(fgsea)
ranks <- deframe(y)
head(ranks, 20)

pathways.hallmark <- gmtPathways("h.all.v6.2.symbols.gmt")
pathways.hallmark %>% 
  head() %>% 
  lapply(head)

fgseaRes <- fgsea(pathways=pathways.hallmark, stats=ranks, nperm=1000)

fgseaResTidy <- fgseaRes %>%
  as_tibble() %>%
  arrange(desc(NES))

# Show in a nice table:
fgseaResTidy %>% 
  dplyr::select(-leadingEdge, -ES, -nMoreExtreme) %>% 
  arrange(padj) %>% 
  DT::datatable()
ADD REPLY

Login before adding your answer.

Traffic: 2661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6