heatmap in R
2
0
Entering edit mode
4.2 years ago
Rob ▴ 170

hi friends I did differential expression analysis for HT-Seq data and I want to plot heatmap for significant genes. But my heatmap is not good. I compared two groups and my workflow is as below:

rdata <- read.table("myData.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)

library(pheatmap)
library(DESeq2)
library(ggplot2)

## Differential abundance
alpha <- 0.05 #set the cutoff value
##fold change cut-off
beta <- 2 
## Create metadata 
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 22), rep("1", 22)))
colnames(sample_org) <- c("Group")

dds <- DESeqDataSetFromMatrix(countData = rdata,
                              colData = sample_org,
                              design = ~Group)

dd <- DESeq(dds)


res <- results(dd)
#Saving results
write.csv(res,"res.csv")
## remove NAs
res2 <- res[!is.na(res$padj),]
###########################################################

## subset only significant genes
sig <- res2[res2$padj < alpha,]
sig2 <- sig[sig$log2FoldChange > beta,]
sig_genes <- rownames(sig2)
subset <- rdata[sig_genes,]
## log transform data for visualization
tdata <- log2(subset + 0.5)
mat <- as(tdata, "matrix")

## Set colours
my_colour = list(Group = c("0" = "blue", "1" = "yellow"))

#################
pheatmap(symbreaks = FALSE, cluster_cols = FALSE,
         cluster_rows = TRUE,color = colorRampPalette(c("#f71616","#f71616","white",
                                                        "#1919d4", "#1919d4"))(100),annotation_col = sample_org, annotation_colors = my_colour, mat, scale = "row")

I want a heatmap like this

enter image description here

that clearly show difference between two groups, over represented genes together and under represented genes together:

My heatmap is here : enter image description here

with no clear pattern

Any body can help me with this plz?

rna-seq DESeq2 R • 1.4k views
ADD COMMENT
4
Entering edit mode
4.2 years ago

The images you want to show are not linked. Also you seem to be plotting the log of the raw data. Try using the normalized counts instead or the log fold change since this is usually what we're interested in. Check the DESeq2 vignette section on heatmaps.

ADD COMMENT
0
Entering edit mode

Hello Jean-Karim Can you explain how should I have normalized count? I usually use: log2(data + 0.5) to normalize my data. I dont know how to do normalized count?

Also, what do you mean by "use log fold change"? do you mean instead of FDR cut off for selection significant genes I use fold change cut off?

ADD REPLY
0
Entering edit mode

DESeq2 does normalization by dividing the counts by a sample-specific factor (details can be found in the DESeq paper). You can access normalized counts with the counts() function using the argument normalized=TRUE. Check the package documentation and the vignette. My suggestion was to also/alternatively visualize the log fold changes as heatmap.

ADD REPLY
0
Entering edit mode

Thanks Karim Is there any example about visualizing log fold change as heatmap? because I still dont get this part. at the moment I have significant genes as rows and patient IDs as columns. I dont know how to replace these with log fold change?

ADD REPLY
3
Entering edit mode
4.2 years ago
luiz.rocha ▴ 30

You can try to use scale() instead of log or change your clustering strategy for rows.

ADD COMMENT

Login before adding your answer.

Traffic: 2055 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6