GSVA can show this. It will look at your differential expression results and then infer from this whether certain pathways / processes are statistically significantly up- or down-regulated in your samples. Getting GSVA working can be cumbersome, though. The vignette helped me as a starting point: GSVA: The Gene Set Variation Analysis package
for microarray and RNA-seq data
With GSVA, the general process is:
- Enrich gene list against a GSVA dataset to obtain matrix of samples
versus enrichment terms
- Perform limma on enrichment matrix, comparing your samples'
conditions of interest
- Plot statistically significant terms from enrichment matrix in
heatmap
Here is a working example that I did using the C2 gene sets:
http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2
The starting point is just rlog counts and a results object from DESeq2 or whatever else you've used.
With GSVA, you load whatever datasets against which you want to perform enrichment.
#################################
#Perform GSVA analysis
#################################
require(GSEABase)
require(GSVAdata)
require(Biobase)
require(genefilter)
require(limma)
require(RColorBrewer)
require(GSVA)
require(gplots)
data(c2BroadSets) #http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2
#only include KEGG pathways
#kegg <- c2BroadSets[which(names(c2BroadSets) %in% names(c2BroadSets)[grep("KEGG_", names(c2BroadSets))])]
#Save rlog counts as new object
df <- assay(rld)
#Filter out genes that pass 5% FDR and absolute log2FC > 2
topTable <- as.data.frame(results)
sigGeneList <- subset(topTable, abs(log2FoldChange)>=2 & padj<=0.05)[,1]
topMatrix <- df[which(rownames(df) %in% sigGeneList),]
#Convert the HGNC names to Entrez IDs (eliminate non-matches where necessary)
require(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annots <- getBM(mart=mart,
attributes=c("hgnc_symbol", "entrezgene"),
filter="hgnc_symbol",
values=rownames(topMatrix),
uniqueRows=TRUE)
annots <- annots[!duplicated(annots[,1]),]
topMatrix <- topMatrix[which(rownames(topMatrix) %in% annots[,1]),]
annots <- annots[which(annots[,1] %in% rownames(topMatrix)),]
topMatrix <- topMatrix[match(annots[,1], rownames(topMatrix)),]
rownames(topMatrix) <- annots[,2]
#Perform GSVA
topMatrixGSVA <- gsva(topMatrix,
c2BroadSets,
min.sz=10,
max.sz=999999,
abs.ranking=FALSE,
verbose=TRUE)
design <- model.matrix(~ factor(metadata$condition, levels=c("case", "control")))
colnames(design) <- c("case", "control")
fit <- lmFit(topMatrixGSVA, design)
fit <- eBayes(fit)
sigPathways <- topTable(fit, coef="condition.caseVscontrol", number=Inf, p.value=0.05, adjust="BH")
sigPathways <- sigPathways[abs(sigPathways$logFC)>1,]
#Filter the GSVA object to only include significant pathways
topMatrixGSVA <- topMatrixGSVA[rownames(sigPathways),]
#Set colour
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-1.5, 1.5, length.out=101)
heat <- t(scale(t(topMatrixGSVA)))
par(mar=c(2,2,2,2), cex=0.8)
heatmap.2(heat,
col=myCol,
breaks=myBreaks,
main="GSVA",
key=TRUE,
keysize=1.0,
key.title="",
key.xlab="Enrichment Z-score",
scale="none",
ColSideColors=dfCol,
density.info="none",
reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
trace="none",
cexRow=0.8,
cexCol=1.0,
distfun=function(x) dist(x, method="euclidean"),
hclustfun=function(x) hclust(x, method="ward.D2"),
margin=c(10,25))
Another example here:
Kevin
http://pathwax.sbc.su.se
That tool tells which processes are defenitly NOT associated with a gene set (depleted option), it does not tell if geneset is mainly activating or repressing a process which what I asked. But thank you for the input anyway