RNAseq heatmap.2 log2FC clustering
2
1
Entering edit mode
10.0 years ago
Ming Tommy Tang ★ 4.5k

Hi there,

I have a RNA-seq experiment with four samples.

control, treatment1, treatment2, treatment3. each with duplicates, I used DEseq to determined the differentially expressed genes for: treatment1 vs control, treatment2 vs control and treatment3 vs control.

treatment1 over-expresses a wild type protein while treatment2 and 3 express mutant proteins which have partial function of the WT protein.

treatment1 will cause some genes to be upregulated and downregulated. treatment2 and 3 will have some overlapping genes that are also upregulated and downregulated as treatment1 but will have some genes do not change (those mutant proteins only have partial function.

Now, I want to make a heatmap (using log2 fold change) with genes that are differentially changed with treatment1.

res1 <- nbinomTest( cds, 'control', 'treatment1' )
res2 <- nbinomTest( cds, 'control, 'treatment2' )
res3 <- nbinomTest( cds, 'control, 'treatment3' )

df<- data.frame(id=res1$id, logFC1=res1$log2FoldChange,padj1=res1$padj,
                logFC2=res2$log2FoldChange, padj2=res2$padj,logFC3=res3$log2FoldChange, padj3=res3$padj)

df<- df[abs(df$logFC1)>1 & df$padj1<0.01,]  # all the genes significantly changed with treatment1

df <- na.omit(df)
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]

df <- na.omit(df)   #remove NA
df<-  df[is.finite(df$logFC1) & is.finite(df$logFC2) & is.finite(df$logFC3),]  #remove Inf

dim(df)
[1] 1669    7

if I plot a scatter plot of logFC1:treatment1 vs control AND logFC2:treatment2 vs control by

library(ggplot2)
ggplot(df) + geom_point(aes(x=df$logFC1, y=df$logFC2)) + geom_hline(y=c(1,-1), color='red',linetype=2)

it shows me:

< image not found >

It makes sense, because I pre-choosed for logFC1 with abs(logFC1)>1.

Now, if I want make a heatmap with heatmap.2:

m<- as.matrix(df[,c(2,4,6)])

> head(m)
      logFC1     logFC2     logFC3
12 -1.799917 -5.0813424 -2.6516590
23 -2.843064 -0.2089547 -0.9493254
33 -1.474999 -0.2902581 -0.1040179
51 -1.447214 -0.5684074 -0.4131914
61  1.999213  1.8802237  1.2279896
65  1.022975  0.6953835  0.8228315

colnames(m)<- c("treatment1_VS_control","treatment2_VS_control","treatment3_VS_control")

hmcols<- colorRampPalette(c("green4","green","white", "red","red4"))(256)

# cluster with the pearson distance according to this post https://www.biostars.org/p/91978/#120843
library(gplots)
heatmap.2(m, col=hmcols, scale="row", hclust=function(x) hclust(x, method='complete'), distfun=function(x) as.dist(1-cor(t(x))), trace="none",margin=c(17,15), density.info="none", labRow=NA)

< image not found >

I can also change the method to "single", scale to "none", but I still did not get what I expected. I should see group of genes are both up-regulated with treatment1 and treatment2, treatment1 and treatment3, treatment1 2 and 3, genes only upregulated in treatment1 but not 2 and 3.

if I use the default euclidean distance which also have problems http://liorpachter.wordpress.com/2014/01/19/why-do-you-look-at-the-speck-in-your-sisters-quilt-plot-and-pay-no-attention-to-the-plank-in-your-own-heat-map/

heatmap.2(m, col=hmcols, scale="none", hclust=function(x) hclust(x, method='complete'), distfun=function(x) dist(x,method='euclidean'), trace="none",margin=c(17,15), density.info="none", labRow=NA)

It looks even worse:

< image not found >

How can I cluster the genes as what I want?

Thank you so much

heatmap RNA-Seq • 11k views
ADD COMMENT
0
Entering edit mode

Why don't you make a heatmap of RPKM/FPKM or normalized count data for your set of genes.

ADD REPLY
0
Entering edit mode

I can also make a heatmap with variance stabilized counts, or log2 transformed library size normalized counts, but the problem will be same for me.

ADD REPLY
0
Entering edit mode

I guess I don't understand, why do you think they look bad? I see everything you want in the first heat map. I mean you could always order them yourself and turn off reordering via ROWV=FALSE, otherwise you're saying that you just don't like the way it looks. OK, but so what? It's an unsupervised method.

ADD REPLY
2
Entering edit mode
10.0 years ago
Chirag Nepal ★ 2.4k

You could also try to make ecdf plot, where you will have 3 lines for each treatment.

ecdf (log2(FChange-T1), do.points=F, verticlas=T)
lines (log2(FChange-T2), do.points=F, vertical=T, col="red")
lines (log2(FChange-T3), do.points=F, vertical=T, col="blue")

This will clearly give you which treatment is more responsive, though won't give the picture about treatment on individual gene.

ADD COMMENT
0
Entering edit mode
7.8 years ago
TriS ★ 4.7k

scale your FPKM data before passing them to the heatmap.

otherwise I like pheatmap

m <- myFPKMdata 
pheatmap::pheatmap(m,scale="row")
ADD COMMENT

Login before adding your answer.

Traffic: 2381 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6