Know if a pathway is being functionally activated or repressed by a group of up-regulated genes
1
1
Entering edit mode
6.3 years ago
salamandra ▴ 550

Imagine we identify a process/pathway that is enriched in up-regulated differentially expressed genes. As the genes included in a pathway can be both activators or repressors of that pathway, we don't know if the differentially expressed up-regulated genes are in fact activating or repressing that pathway. Is there a easy automatic way of building a geneset with just the activating genes of a pathway, besides searching in the literature for each gene individually?

I know a text mining tool that finds positive or negative interactions between a gene and a term, but for each gene reports both positive and negative interactions, so there's no way of telling if the gene is activating or repressed. Or can we assume that if a gene has more positive than neg interactions with the term is because is activating it. What is the common practice here?

Best

RNA-Seq text-mining gene set enrichment • 3.3k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
0
Entering edit mode

That tool tells which processes are defenitly NOT associated with a gene set (depleted option), it does not tell if geneset is mainly activating or repressing a process which what I asked. But thank you for the input anyway

ADD REPLY
7
Entering edit mode
6.3 years ago

GSVA can show this. It will look at your differential expression results and then infer from this whether certain pathways / processes are statistically significantly up- or down-regulated in your samples. Getting GSVA working can be cumbersome, though. The vignette helped me as a starting point: GSVA: The Gene Set Variation Analysis package for microarray and RNA-seq data

With GSVA, the general process is:

  1. Enrich gene list against a GSVA dataset to obtain matrix of samples versus enrichment terms
  2. Perform limma on enrichment matrix, comparing your samples' conditions of interest
  3. Plot statistically significant terms from enrichment matrix in heatmap

Here is a working example that I did using the C2 gene sets: http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2

The starting point is just rlog counts and a results object from DESeq2 or whatever else you've used.

With GSVA, you load whatever datasets against which you want to perform enrichment.

#################################
#Perform GSVA analysis
#################################

require(GSEABase)
require(GSVAdata)
require(Biobase)
require(genefilter)
require(limma)
require(RColorBrewer)
require(GSVA)
require(gplots)


data(c2BroadSets) #http://software.broadinstitute.org/gsea/msigdb/collections.jsp#C2

#only include KEGG pathways
#kegg <- c2BroadSets[which(names(c2BroadSets) %in% names(c2BroadSets)[grep("KEGG_", names(c2BroadSets))])]

#Save rlog counts as new object
df <- assay(rld)

#Filter out genes that pass 5% FDR and absolute log2FC > 2
topTable <- as.data.frame(results)
sigGeneList <- subset(topTable, abs(log2FoldChange)>=2 & padj<=0.05)[,1]
topMatrix <- df[which(rownames(df) %in% sigGeneList),]

#Convert the HGNC names to Entrez IDs (eliminate non-matches where necessary)
require(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL")
mart <- useDataset("hsapiens_gene_ensembl", mart)
annots <- getBM(mart=mart,
  attributes=c("hgnc_symbol", "entrezgene"),
  filter="hgnc_symbol",
  values=rownames(topMatrix),
  uniqueRows=TRUE)
annots <- annots[!duplicated(annots[,1]),]
topMatrix <- topMatrix[which(rownames(topMatrix) %in% annots[,1]),]
annots <- annots[which(annots[,1] %in% rownames(topMatrix)),]
topMatrix <- topMatrix[match(annots[,1], rownames(topMatrix)),]
rownames(topMatrix) <- annots[,2]

#Perform GSVA   
topMatrixGSVA <- gsva(topMatrix,
  c2BroadSets,
  min.sz=10,
  max.sz=999999,
  abs.ranking=FALSE,
  verbose=TRUE)

design <- model.matrix(~ factor(metadata$condition, levels=c("case", "control")))
colnames(design) <- c("case", "control")
fit <- lmFit(topMatrixGSVA, design)
fit <- eBayes(fit)
sigPathways <- topTable(fit, coef="condition.caseVscontrol", number=Inf, p.value=0.05, adjust="BH")
sigPathways <- sigPathways[abs(sigPathways$logFC)>1,]

#Filter the GSVA object to only include significant pathways
topMatrixGSVA <- topMatrixGSVA[rownames(sigPathways),]

#Set colour
myCol <- colorRampPalette(c("dodgerblue", "black", "yellow"))(100)
myBreaks <- seq(-1.5, 1.5, length.out=101)
heat <- t(scale(t(topMatrixGSVA)))

par(mar=c(2,2,2,2), cex=0.8)

heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="GSVA",
  key=TRUE,
  keysize=1.0,
  key.title="",
  key.xlab="Enrichment Z-score",
  scale="none",
  ColSideColors=dfCol,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  trace="none",
  cexRow=0.8,
  cexCol=1.0,
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margin=c(10,25))

j

Another example here:

Kevin

ADD COMMENT
0
Entering edit mode

Thank very much. This is really useful

ADD REPLY
0
Entering edit mode

Sorry Kevin,

For this tutorial you have used c2BroadSets , can I use that for any analysis? For instance, if I am interested in response to chemotherapy in a cancer in responder patients to non-responders?

ADD REPLY
1
Entering edit mode

It will depend on the 'gene sets' that are included in C2. Yo can search for key terns, here: http://software.broadinstitute.org/gsea/msigdb/genesets.jsp?collection=C2

After you search, the results appear at the bottom of the page.

Two gene sets that I have immediately found from keyword chemotherapy are:

ADD REPLY
0
Entering edit mode

Thank you

I have done with same c2BroadSets

but seems to me these pathways all are meaningless in terms of cancer

> head(sigPathways)
                                              logFC    AveExpr         t    P.Value adj.P.Val         B
SAMOLS_TARGETS_OF_KHSV_MIRNAS_UP          0.5691057  0.5691057  2.482072 0.03441202 0.6950803 -4.091589
REACTOME_PHOSPHORYLATION_OF_THE_APC       0.4853805  0.4853805  2.429304 0.03754903 0.6950803 -4.110608
NIKOLSKY_BREAST_CANCER_16P13_AMPLICON    -0.4532520 -0.4532520 -2.375375 0.04104812 0.6950803 -4.130214
MOREIRA_RESPONSE_TO_TSA_UP               -0.4247967 -0.4247967 -2.358448 0.04221144 0.6950803 -4.136400
SENGUPTA_NASOPHARYNGEAL_CARCINOMA_DN     -0.5000000 -0.5000000 -2.350639 0.04275902 0.6950803 -4.139260
SHARMA_PILOCYTIC_ASTROCYTOMA_LOCATION_UP -0.5000000 -0.5000000 -2.350639 0.04275902 0.6950803 -4.139260
>
ADD REPLY
0
Entering edit mode

Just a quick question for you, @Kevin, on this line:

sigPathways <- sigPathways[abs(sigPathways$logFC)>1,]

As I understand limma, when it is used for differential gene expression, it assumes that the input expression values are in the Log scale, and so it simply computes the difference between the average expressions to be the LogFC. When we apply limma to the GSVA scores, the same calculations are done and so the LogFC should actually correspond to the difference between the average GSVA scores between the two conditions, correct?

In other words, for the GSVA scores, the "logFC" is really not the log-scale fold change, but just the simple difference between the average GSVA scores. Isn't this so?

Many thanks.

ADD REPLY

Login before adding your answer.

Traffic: 2488 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6