Question

Heatmap.2 R for differentially expressed genes DeSEQ2

3

Entering edit mode

9.5 years ago

User6891 ▴ 340

Hi,

I'm trying to make a heatmap for a set of differentially expressed genes. I'm using deseq2 in R together with the heatmap.2 function

This is the command I've used

dds <- DESeq(dds)
sig <- resOrdered[!is.na(resOrdered$padj) & resOrdered$padj<0.10  abs(resOrdered$log2FoldChange)>=1,]

selected <- rownames(sig)
hmcol <- colorRampPalette(brewer.pal(9, "GnBu"))(100)
heatmap.2(log2(counts(dds,normalized=TRUE)[rownames(dds) %in% selected,]), col = hmcol, scale="row", rowv = TRUE, colv = FALSE, dendrogram="row", trace="none", margin=c(4,6), cexRow=0.5, cexCol=1, keysize=1)

when running this I get the error

Error in hclustfun(distr) : NA/NaN/Inf in foreign function call (arg 11)

Any ideas how to solve this?

R • 26k views

ADD COMMENT • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by User6891 ▴ 340

0

Entering edit mode

@User6891 do you want to receive help ? if yes, you should give an example input so people can reproduce the error you have and can then help you with it.

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by Mo ▴ 920

0

Entering edit mode

I'm aware of this. The problem is that 'dds' is a 'DeSeqDataSet', so it's not that easy to just reproduce this here. I suppose that some columns contains N/A values which heatmap.2 doesn't like? But it's quite normal that an expression dataset contains N/A values, so I was wondering if somebody had encountered the same problem

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by User6891 ▴ 340

0

Entering edit mode

I checked & the problem is that there are some '-Inf' values in the dds matrix. Any idea how to deal with those?

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by User6891 ▴ 340

0

Entering edit mode

@User6891 There are few options, for example you can replace zero instead NA, so you won't face any problem. Do this

YourDataName[is.na(YourdataName)] <- 0

and check if it works , also a similar problem was asked on stack overflow

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by Mo ▴ 920

0

Entering edit mode

Inf comes because you have zeroes into your data. Log of 0 is infinite

A general function to avoid this is yomake the zeroes = 1

Log2(counts(dds... contains some zeroes...

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 9.5 years ago by Antonio R. Franco ★ 5.2k

0

Entering edit mode

Hi @mbio.kyle,

when I try to follow you first lot of code:

> matrix<-matrix-rowMeans(matrix)

I get:

Error in rowMeans(matrix) : 
  'x' must be an array of at least two dimensions.

Please help??

ADD REPLY • link updated 6.9 years ago by Ram 45k • written 8.5 years ago by samuel ▴ 260

0

Entering edit mode

On the previous line you're missing a comma to index rld by topVarGenes in two dimension way:

# use the log transform on the data set
matrix <- assay(rld)[ topVarGenes, ]
matrix <- matrix - rowMeans(matrix)

ADD REPLY • link 7.4 years ago by leaodel ▴ 190

Ram · Answer 1 · 2016-02-26

You do not want to make heatmaps from the DeSeqDataSet object. Here is an example workflow. Note it uses the pheatmap R library

# make the DeSeqDataSet
dds <- DESeq(dds)

# use the log transform on the data set
rld <- rlog(dds, blind=F)
topVarianceGenes <- head(order(rowVars(assay(rld)), decreasing=T),100)
matrix <- assay(rld)[ topVarGenes ]
matrix <- matrix - rowMeans(matrix)

# select the 'contrast' you want
annotation_data <- as.data.frame(colData(rld)[c("ConditionA","ConditionB")])
pheatmap(matrix, annotation_col=annotation_data)

The above will plot the most varied genes across all conditions. You can substitute the 100 for whichever number you like to control how many you get. If you have alot of different conditions/contrasts and want to make some more specific heatmaps, here is a function which will plot the top 20 genes for the up and down direction (i.e. largest log2FoldChange between the two conditions up and down)

plotUpDownSigGenes <- function(results, colNums, rld, title) {

    # make the lists
    upgenes <- rownames(head(results[ order( results$log2FoldChange ), ], n=20))
    downgenes <- rownames(head(results[ order( -results$log2FoldChange ), ], n=20))

    # this gives us the rows we want
    rows <- match(upgenes, row.names(rld))
    mat <- assay(rld)[rows,colNums]
    mat <- mat - rowMeans(mat)

    # the labels are hard coded at the moment :(
    df <- as.data.frame(colData(rld)[c("labelA","labelB")])
    pheatmap(mat, fontsize=5, annotation_col=df, main=paste(title,"top 20 up genes"))

    # this gives us the rows we want
    rows <- match(downgenes, row.names(rld))
    mat <- assay(rld)[rows,colNums]
    mat <- mat - rowMeans(mat)

    df <- as.data.frame(colData(rld)[c("labelA","labelB")])
    pheatmap(mat, fontsize=5, annotation_col=df, main=paste(title,"top 20 down genes"))
}

Here is an example call for the function:

contrastDEGenes <- subset(results(dds, contrast=c("A","B")), padj < 0.05)

# this part is kind of funky
# the function needs to know which columns
# correspond to the samples (to pull from rld)
aCols <- c(1,2,3)
bCols <- c(4,5,6)

# get the log transforms again
rld <- rlog(dds, blind=F)

# call
plotUpDownSigGenes(
    contrastDEGenes,
    c(aCols, bCols),
    rld,
    "Title for the plot"
)

Hope it helps!