Question

Know if a pathway is being functionally activated or repressed by a group of up-regulated genes

1

Entering edit mode

6.3 years ago

salamandra ▴ 550

Imagine we identify a process/pathway that is enriched in up-regulated differentially expressed genes. As the genes included in a pathway can be both activators or repressors of that pathway, we don't know if the differentially expressed up-regulated genes are in fact activating or repressing that pathway. Is there a easy automatic way of building a geneset with just the activating genes of a pathway, besides searching in the literature for each gene individually?

I know a text mining tool that finds positive or negative interactions between a gene and a term, but for each gene reports both positive and negative interactions, so there's no way of telling if the gene is activating or repressed. Or can we assume that if a gene has more positive than neg interactions with the term is because is activating it. What is the common practice here?

Best

RNA-Seq text-mining gene set enrichment • 3.4k views

ADD COMMENT • link updated 6.3 years ago by Kevin Blighe 88k • written 6.3 years ago by salamandra ▴ 550

0

Entering edit mode

http://pathwax.sbc.su.se

ADD REPLY • link 6.3 years ago by Za ▴ 140

0

Entering edit mode

That tool tells which processes are defenitly NOT associated with a gene set (depleted option), it does not tell if geneset is mainly activating or repressing a process which what I asked. But thank you for the input anyway

ADD REPLY • link 6.3 years ago by salamandra ▴ 550

score 7 · Accepted Answer · 2018-08-18

GSVA can show this. It will look at your differential expression results and then infer from this whether certain pathways / processes are statistically significantly up- or down-regulated in your samples. Getting GSVA working can be cumbersome, though. The vignette helped me as a starting point: GSVA: The Gene Set Variation Analysis package for microarray and RNA-seq data

With GSVA, the general process is:

Enrich gene list against a GSVA dataset to obtain matrix of samples versus enrichment terms
Perform limma on enrichment matrix, comparing your samples' conditions of interest
Plot statistically significant terms from enrichment matrix in heatmap

Here is a working example that I did using the C2 gene sets: http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2

The starting point is just rlog counts and a results object from DESeq2 or whatever else you've used.

With GSVA, you load whatever datasets against which you want to perform enrichment.

#################################
#Perform GSVA analysis
#################################

require(GSEABase)
require(GSVAdata)
require(Biobase)
require(genefilter)
require(limma)
require(RColorBrewer)
require(GSVA)
require(gplots)


data(c2BroadSets) #http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2

#only include KEGG pathways
#kegg <- c2BroadSets[which(names(c2BroadSets) %in% names(c2BroadSets)[grep("KEGG_", names(c2BroadSets))])]

#Save rlog counts as new object
df <- assay(rld)

#Filter out genes that pass 5% FDR and absolute log2FC > 2
topTable <- as.data.frame(results)
sigGeneList <- subset(topTable, abs(log2FoldChange)>=2 & padj<=0.05)[,1]
topMatrix <- df[which(rownames(df) %in% sigGeneList),]

#Convert the HGNC names to Entrez IDs (eliminate non-matches where necessary)
require(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annots <- getBM(mart=mart,
  attributes=c("hgnc_symbol", "entrezgene"),
  filter="hgnc_symbol",
  values=rownames(topMatrix),
  uniqueRows=TRUE)
annots <- annots[!duplicated(annots[,1]),]
topMatrix <- topMatrix[which(rownames(topMatrix) %in% annots[,1]),]
annots <- annots[which(annots[,1] %in% rownames(topMatrix)),]
topMatrix <- topMatrix[match(annots[,1], rownames(topMatrix)),]
rownames(topMatrix) <- annots[,2]

#Perform GSVA   
topMatrixGSVA <- gsva(topMatrix,
  c2BroadSets,
  min.sz=10,
  max.sz=999999,
  abs.ranking=FALSE,
  verbose=TRUE)

design <- model.matrix(~ factor(metadata$condition, levels=c("case", "control")))
colnames(design) <- c("case", "control")
fit <- lmFit(topMatrixGSVA, design)
fit <- eBayes(fit)
sigPathways <- topTable(fit, coef="condition.caseVscontrol", number=Inf, p.value=0.05, adjust="BH")
sigPathways <- sigPathways[abs(sigPathways$logFC)>1,]

#Filter the GSVA object to only include significant pathways
topMatrixGSVA <- topMatrixGSVA[rownames(sigPathways),]

#Set colour
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-1.5, 1.5, length.out=101)
heat <- t(scale(t(topMatrixGSVA)))

par(mar=c(2,2,2,2), cex=0.8)

heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="GSVA",
  key=TRUE,
  keysize=1.0,
  key.title="",
  key.xlab="Enrichment Z-score",
  scale="none",
  ColSideColors=dfCol,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  trace="none",
  cexRow=0.8,
  cexCol=1.0,
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margin=c(10,25))

Another example here:

GSVA enrichment score and heatmap

Kevin