hi friends I did differential expression analysis for HT-Seq data and I want to plot heatmap for significant genes. But my heatmap is not good. I compared two groups and my workflow is as below:
rdata <- read.table("myData.txt", header = TRUE, row.names = 1, stringsAsFactors = FALSE)
library(pheatmap)
library(DESeq2)
library(ggplot2)
## Differential abundance
alpha <- 0.05 #set the cutoff value
##fold change cut-off
beta <- 2
## Create metadata
sample_org <- data.frame(row.names = colnames(rdata), c(rep("0", 22), rep("1", 22)))
colnames(sample_org) <- c("Group")
dds <- DESeqDataSetFromMatrix(countData = rdata,
colData = sample_org,
design = ~Group)
dd <- DESeq(dds)
res <- results(dd)
#Saving results
write.csv(res,"res.csv")
## remove NAs
res2 <- res[!is.na(res$padj),]
###########################################################
## subset only significant genes
sig <- res2[res2$padj < alpha,]
sig2 <- sig[sig$log2FoldChange > beta,]
sig_genes <- rownames(sig2)
subset <- rdata[sig_genes,]
## log transform data for visualization
tdata <- log2(subset + 0.5)
mat <- as(tdata, "matrix")
## Set colours
my_colour = list(Group = c("0" = "blue", "1" = "yellow"))
#################
pheatmap(symbreaks = FALSE, cluster_cols = FALSE,
cluster_rows = TRUE,color = colorRampPalette(c("#f71616","#f71616","white",
"#1919d4", "#1919d4"))(100),annotation_col = sample_org, annotation_colors = my_colour, mat, scale = "row")
I want a heatmap like this
that clearly show difference between two groups, over represented genes together and under represented genes together:
My heatmap is here :
with no clear pattern
Any body can help me with this plz?
Hello Jean-Karim Can you explain how should I have normalized count? I usually use: log2(data + 0.5) to normalize my data. I dont know how to do normalized count?
Also, what do you mean by "use log fold change"? do you mean instead of FDR cut off for selection significant genes I use fold change cut off?
DESeq2 does normalization by dividing the counts by a sample-specific factor (details can be found in the DESeq paper). You can access normalized counts with the counts() function using the argument normalized=TRUE. Check the package documentation and the vignette. My suggestion was to also/alternatively visualize the log fold changes as heatmap.
Thanks Karim Is there any example about visualizing log fold change as heatmap? because I still dont get this part. at the moment I have significant genes as rows and patient IDs as columns. I dont know how to replace these with log fold change?