Simplest way to perform GSEA from a weighted list of entrez gene ids
3
1
Entering edit mode
8.4 years ago

As part of a larger project I've implemented GSEA in Matlab. I want to test my code by comparing my output - p-values for KEGG pathways say, with another implementation. In my implementation the GSEA algorithm starts from a ranked list of gene ids with real valued weights. So I want to upload the same list to the tool used for comparison. However, as I look round tools for GSEA most of them seem to start at an earlier point, requiring the original gene data, which would mean that there would be scope for differences in preprocessing to affect the output, whereas I want a pure comparison of the GSEA part.

What would you say is the simplest way to perform GSEA (not another enrichment algorithm) on a weighted list of entrez-gene ids?

I'm fine if it involves some coding, but don't want to get bogged down editing large amounts of other people's code to perform what should be a straightforward test to check that my results are comparable with the expected results.

gsea enrichment transcriptomics • 6.3k views
ADD COMMENT
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Perfect, just what I needed.

ADD REPLY
2
Entering edit mode
8.4 years ago
alserg ▴ 980

If you want to compare you implementation of GSEA with others, I'd recommend as @vodka suggested to use the Broad's program: it's developed by the original authors and works pretty fast compared to other implentations. Just beware it results in zero p-value when there was no permutation with more extreme statistic, which is biased and is not a recommended method (http://www.statsci.org/smyth/pubs/permp.pdf).

However, I would recommend to check out an R-package https://github.com/ctlab/fgsea (disclaimer: I'm the author of this package) and not to implement the GSEA method yourself: likely it will be slow if you're testing many pathways. You'd probably be interested in looking at this package as it implements a special algorithm to simulthaneously build background distribution for all the gene set sizes and thus it works much faster (up to hundred times) than all the versions I'm aware of, but giving the same p-values.

ADD COMMENT
0
Entering edit mode

I have compared clusterProfiler with broad institute's GSEA-P, https://guangchuangyu.github.io/2015/11/comparison-of-clusterprofiler-and-gsea-p/.

clusterProfiler and GSEA-P output identical pvalues. I also test fgsea with clusterProfiler, and they also output identical pvalues.

fgsea is indeed very fast and will be the back engine of next release of clusterProfiler.

ADD REPLY
0
Entering edit mode

Thanks for the warnings. In this case the speed of the enrichment algorithm isn't the limiting factor, but I'll keep that in mind for future projects.

ADD REPLY
0
Entering edit mode

I am also trying fgsea & looks very easy and fast compare to GSEA, but how can I use my own geneset.gmt file and ranked file, I have tried with example data,

library(fgsea)
library(ggplot2)

data(examplePathways)
data(exampleRanks)
fgseaRes <- fgsea(pathways = examplePathways, stats = exampleRanks, minSize=15, maxSize=500, nperm=10000)
plotEnrichment(examplePathways[["5991130_Programmed_Cell_Death"]], exampleRanks) + labs(title="Programmed Cell Death")

so how to read my .gmt file and rank file in R, I tried following command but got errors:

gset <- GSA.read.gmt("genesets.gmt")
ranked <- read.table("rankedlist.txt",header=F, sep="\t")
fgseaRes <- fgsea(pathways = gset, stats = ranked, minSize=15,maxSize=500, nperm=10000)

Error in [.data.frame(x, order(x, na.last = na.last, decreasing = decreasing)) : undefined columns selected

ADD REPLY
0
Entering edit mode

Mike, there is an example for that in the vignette: http://bioconductor.org/packages/devel/bioc/vignettes/fgsea/inst/doc/fgsea-tutorial.html, see "Starting from files" section. Hope that helps.

ADD REPLY
0
Entering edit mode

Thanks, assaron,

I tried following command, but rnk.file & gmt.file are empty...

> rnk.file <- system.file("ranked.rnk")
> rnk.file
[1] ""
> gmt.file <- system.file("geneset.gmt")
> gmt.file
[1] ""
ADD REPLY
1
Entering edit mode

Assuming you have downloaded files https://raw.githubusercontent.com/ctlab/fgsea/master/inst/extdata/naive.vs.th1.rnk and https://raw.githubusercontent.com/ctlab/fgsea/master/inst/extdata/mouse.reactome.gmt into your working directory, this should work

rnk.file <- "naive.vs.th1.rnk"
gmt.file <- "mouse.reactome.gmt"
ranks <- read.table(rnk.file,
                    header=TRUE, colClasses = c("character", "numeric"))
ranks <- setNames(ranks$t, ranks$ID)    
pathways <- gmtPathways(gmt.file)    
fgseaRes <- fgsea(pathways, ranks, minSize=15, maxSize=500, nperm=1000)

The ranks variable should be just a vector of gene wieights with names being gene ids and pathways is a list of vectors with gene ids.

If it still doesn't work, I suggest switching to e-mail. Mine is alsergbox@gmail.com

ADD REPLY
0
Entering edit mode

yes it works now!!!

Thanks for your quick response.

ADD REPLY
0
Entering edit mode

When you say gene weights here, do you mean fold changes? P-values? Something else?

ADD REPLY
0
Entering edit mode
6.5 years ago
pshubhamoy ▴ 20

how do I subset a formal class gseaResults?

I have the following

Formal class 'gseaResult' [package "DOSE"] with 10 slots ..@ result :'data.frame': 20 obs. of 11 variables: .. ..$ ID : chr [1:20] "rno05200" "rno04010" "rno04151" "rno05166" ... .. ..$ Description : chr [1:20] "Pathways in cancer" "MAPK signaling pathway" "PI3K-Akt signaling pathway" "HTLV-I infection" ... .. ..$ setSize : int [1:20] 370 224 231 203 152 133 118 108 102 136 ... .. ..$ enrichmentScore: num [1:20] -0.384 -0.401 -0.388 -0.425 -0.432 ... .. ..$ NES : num [1:20] -1.54 -1.52 -1.48 -1.59 -1.56 ... .. ..$ pvalue : num [1:20] 0.00164 0.00167 0.00169 0.00173 0.00174 ... .. ..$ p.adjust : num [1:20] 0.0145 0.0145 0.0145 0.0145 0.0145 ... .. ..$ qvalues : num [1:20] 0.00836 0.00836 0.00836 0.00836 0.00836 ... .. ..$ rank : num [1:20] 2544 1610 1928 1804 2600 ... .. ..$ leading_edge : chr [1:20] "tags=34%, list=23%, signal=27%" "tags=23%, list=15%, signal=20%" "tags=29%, list=18%, signal=24%" "tags=23%, list=17%, signal=20%" ... .. ..$ core_enrichment: chr [1:20] "171140/294962/25729/314384/29560/367072/288264/399489/116502/24426/24654/170668/501110/81685/29492/287357/11448"| __truncated__ "25266/25054/360640/59323/79114/24446/25267/29496/24674/114495/117269/25597/78965/116683/24516/292763/24329/2571"| __truncated__ "65248/361696/292406/24514/89805/78975/25513/300253/29302/25155/310553/64033/25266/25054/59323/79114/25634/11666"| __truncated__ "308995/84420/303539/414788/313050/365527/114212/64033/25266/360640/84426/299611/25267/84027/24674/24516/25289/5"| __truncated__ ... ..@ organism : chr "rno" ..@ setType : chr "KEGG" ..@ geneSets :List of 324 .. ..$ rno00010: chr [1:72] "100145871" "100364027" "100364062" "100911515" ... .. ..$ rno00020: chr [1:33] "100125384" "103690168" "103693780" "114096" ... .. ..$ rno00030: chr [1:31] "100360180" "108348261" "114508" "24189" ... .. ..$ rno00040: chr [1:34] "113992" "116463" "154516" "171408" ... .. ..$ rno00051: chr [1:39] "100364027" "100911515" "100911725" "114508" ... .. ..$ rno00052: chr [1:32] "100364027" "103690059" "114860" "116463" ... .. ..$ rno00053: chr [1:27] "113992" "154516" "24861" "24862" ... .. ..$ rno00061: chr [1:14] "113976" "114024" "116719" "117243" ... .. ..$ rno00062: chr [1:31] "100911186" "102549542" "113965" "140547" ... .. ..$ rno00071: chr [1:47] "100145871" "100911186" "113965" "113976" ... .. ..$ rno00072: chr [1:11] "100361036" "117099" "24450" "25014" ... .. ..$ rno00100: chr [1:19] "114100" "114700" "117278" "140910" ... .. ..$ rno00120: chr [1:16] "170588" "192242" "246211" "25284" ... .. ..$ rno00130: chr [1:12] "103693015" "24314" "24813" "25249" ... .. ..$ rno00140: chr [1:84] "100362350" "100910877" "108348086" "108348266" ... .. ..$ rno00190: chr [1:143] "100188937" "100361126" "100361960" "100362331" ... .. ..$ rno00220: chr [1:20] "192268" "24398" "24399" "24401" ... .. ..$ rno00230: chr [1:182] "100360582" "100361574" "100362333" "100363253" ... .. ..$ rno00232: chr [1:6] "114768" "116631" "116632" "24297" ... .. ..$ rno00240: chr [1:104] "100360582" "100361574" "100362333" "100363253" ... .. ..$ rno00250: chr [1:35] "100360621" "117544" "192268" "24240" ... .. ..$ rno00260: chr [1:40] "103691744" "114027" "114123" "171133" ... .. ..$ rno00270: chr [1:47] "100360621" "100912604" "103691744" "171347" ... .. ..$ rno00280: chr [1:56] "100360621" "100361036" "100911186" "113965" ... .. ..$ rno00290: chr [1:4] "25044" "29592" "360816" "64203" .. ..$ rno00310: chr [1:61] "100169747" "100359816" "100361710" "100362634" ... .. ..$ rno00330: chr [1:52] "100912604" "108348083" "114027" "24264" ... .. ..$ rno00340: chr [1:24] "24443" "25375" "25750" "266603" ... .. ..$ rno00350: chr [1:40] "100145871" "100360621" "103694877" "171178" ... .. ..$ rno00360: chr [1:22] "100360621" "103694877" "171179" "24311" ... .. ..$ rno00380: chr [1:47] "100360621" "100911186" "103693780" "113965" ... .. .. [list output truncated] ..@ geneList : Named num [1:10844] 8.15 6.39 5.33 4.82 4.78 ... .. ..- attr(*, "names")= chr [1:10844] "116463" "313352" "500685" "246253" ... ..@ keytype : chr "UNKNOWN" ..@ permScores : num[0 , 0 ] ..@ params :List of 6 .. ..$ pvalueCutoff : num 0.05 .. ..$ nPerm : num 1000 .. ..$ pAdjustMethod: chr "BH" .. ..$ exponent : num 1 .. ..$ minGSSize : num 100 .. ..$ maxGSSize : num 500 ..@ gene2Symbol: chr(0) ..@ readable : logi FALSE

ADD COMMENT

Login before adding your answer.

Traffic: 1674 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6