Box-and-whisker plot

Question

Hierarchical Clustering in single-channel agilent microarray experiment

1

Entering edit mode

7.2 years ago

Leite ★ 1.3k

Hello everyone,

I'm trying to put a Hierarchical Clustering and boxplot in this code but always from error, what would be the best package to do this?

#Load limma
library(limma)

#Set-up
targetinfo <- readTargets("Targets.txt",row.names="FileName",sep="")

#Read files
project <- read.maimages(targetinfo,source="agilent", green.only=TRUE)

#Background correction
project.bgc <- backgroundCorrect(project, method="normexp", offset=16)

#Normalize the data with the 'quantile' method for 1-color
project.NormData <-normalizeBetweenArrays(project.bgc,method="quantile")

# load colour libraries
library(RColorBrewer)
# set colour palette
cols <- brewer.pal(8, "Set1")

#Histogram of non-normalized
plotDensities(project.bgc, col=cols, legend=FALSE)

#Histogram of normalized
plotDensities(project.NormData, col=cols, legend=FALSE)

#Create the study design and comparison model
design <- paste(targetinfo$Target, sep="")
design <- factor(design)
comparisonmodel <- model.matrix(~0+design)
colnames(comparisonmodel) <- levels(design)
#Checking the experimental design
design
comparisonmodel

project.fit <- lmFit(project.NormData, comparisonmodel)
project.fit <- lmFit(project.NormData,comparisonmodel)

#Applying the empirical Bayes method
project.fit.eBayes <- eBayes(project.fit)
names(project.fit.eBayes)

#Make individual contrasts and fit the new model
CaseControl <- makeContrasts(CaseControl="D0A-Control", levels=comparisonmodel)
CaseControl.fitmodel <- contrasts.fit(project.fit.eBayes, CaseControl)
CaseControl.fitmodel.eBayes <- eBayes(CaseControl.fitmodel)

#Filtering Results
 nrow(topTable(CaseControl.fitmodel.eBayes, coef="CaseControl", number=99999, lfc=2))
 probeset.list <- topTable(CaseControl.fitmodel.eBayes, "CaseControl", number=99999, adjust.method="BH", sort.by="P", lfc=2)

#To save results
write.table(probeset.list, "results.txt", sep="\t", quote=FALSE)

Best regards,

Leite

R Hierarchical clustering agilent • 5.5k views

ADD COMMENT • link updated 7.2 years ago by Kevin Blighe 89k • written 7.2 years ago by Leite ★ 1.3k

1

Entering edit mode

Hi Leite,

I assume your question is about microarray data, but you never specified that. I adapted your title to clarify this.

but always from error

It's very hard for us to figure out what's going wrong if you don't show us the error message.

Cheers,
Wouter

ADD REPLY • link 7.2 years ago by WouterDeCoster 47k

0

Entering edit mode

Hey WouterDeCoster,

Sorry for the lack of information, you're correct, its about microarray data analysis.

I've tryed this code for bloxplot:

boxplot(project.NormData, col=cols)
Error in sort.int(x, na.last = na.last, decreasing = decreasing, ...) : 
'x' must be atomic

And this code for Hierarchical Clustering:

plot(project.NormData)
Error in as.matrix(E$E) : object 'E' not found

Thank you so much,

Leite

ADD REPLY • link 7.2 years ago by Leite ★ 1.3k

1

Entering edit mode

7.2 years ago

Hussain Ather ▴ 990

Check out clustermap

ADD COMMENT • link 7.2 years ago by Hussain Ather ▴ 990

score 14 · Accepted Answer · 2017-12-03

The expression values for a 2- or single-colour Agilent array are stored in the 'M' or 'E' variable, respectively, i.e., project.NormData$M or project.NormData$E ( C: Single-color Agilent array analyzing in R )

For each of the following functions, I encourage you to devote a full day to understanding what each and every parameter is doing. That is the best way for you to learn.

-------------------------------------------

Box-and-whisker plot

par(mar=c(8,8,5,5), cex=1.0, cex.axis=1.4, cex.lab=1.4)
boxplot(project.NormData$E,
  main="Box-and-whisker plot",
  xlab="", ylab=bquote(~Log[2]~expression),
  names=paste("Sample", c(1:ncol(project.NormData$E))),
  col="skyblue",
  las=2,
  outline=FALSE)

Violin plot (Wouter likes violin plots - maybe he plays a violin)

require(reshape2)
violinMatrix <- reshape2::melt(project.NormData$E)
colnames(violinMatrix) <- c("Gene","Sample","Expression")

library(ggplot2)
ggplot(violinMatrix, aes(x=Sample, y=Expression)) +
  geom_violin() +
  theme(axis.text.x = element_text(angle=45, hjust=1))

Hierarchical clustering (unsupervised on entire dataset - very CPU and memory intensive)

For a simple dendrogram or circular dendrogram, take a look at my threads:

A: how to draw circular dendrogram with distance information
A: how to make bootstrapped tree in PVCLUST package with SNP genotyping data?

require(pvclust) pv <- pvclust(project.NormData$E, method.dist="euclidean", method.hclust="ward.D2", nboot=100) plot(pv)

heatmap.2 (hierarchical clustering dendrogram with heatmap)

For the heatmaps, you usually want to filter your expression matrix for genes that are differentially expressed. You appear to have just fitered out probes that are greater than absolute log (base 2) fold change 2, stored in your probeset.list object

#Filter the expression matrix to include only differentially expressed genes
sigmatrix <- project.NormData$E[probeset.list,]

#Scale the filtered expression matrix (convert to Z scale)
heat <- t(scale(t(sigmatrix)))

#Set colour
require(RColorBrewer)
myCol <- colorRampPalette(c("violet", "black", "springgreen"))(100)
myBreaks <- seq(-3, 3, length.out=101)

require("gplots")

#Euclidean distance; Ward's linkage
par(mar=c(1,1,1,1), cex=1.0)
heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="",
  key=T, key.xlab="Expresssion\nZ-score", keysize=1.0,
  scale="none",
  ColSideColors=condition,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean), 
  trace="none",
  cexRow=1.0, cexCol=1.0,
  distfun=function(x) dist(x, method="euclidean"),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margins=c(6, 6))
legend("top",
  bty="n",
  cex=1.0,
  title="Condition",
  c("Wild-type", "Knock-out"), fill=c("yellow", "royalblue"),
  horiz=TRUE)

#1 - Pearson correlation distance; Ward's linkage
par(mar=c(1,1,1,1), cex=1.0)
heatmap.2(heat,
  col=myCol,
  breaks=myBreaks,
  main="",
  key=T, key.xlab="Expresssion\nZ-score", keysize=1.0,
  scale="none",
  ColSideColors=condition,
  density.info="none",
  reorderfun=function(d,w) reorder(d, w, agglo.FUN=mean),
  trace="none",
  cexRow=1.0, cexCol=1.0,
  distfun=function(x) as.dist(1-cor(t(x))),
  hclustfun=function(x) hclust(x, method="ward.D2"),
  margins=c(6, 6))
legend("top",
  bty="n",
  cex=1.0,
  title="Condition",
  c("Wild-type", "Knock-out"), fill=c("yellow", "royalblue"),
  horiz=TRUE)

ComplexHeatmap

For ComplexHeatmap, see my recent post here: C: how to cluster genes in heatmap

I have also posted code in a comment following this answer.