Question

Heatmap deseq2

2

Entering edit mode

3.4 years ago

bart ▴ 50

I'm using deseq2 for DEA but when I create a heatmap with only DEGs, it looks very strange: I'm not sure whether there are only overexpressed genes or whether the dataset is not normalized properly. I probably made a mistake somewhere in my coding but I don't know where to look. Help would be appreciated!

#create deseq object, normalize, use pre-filtering to remove genes with <5 counts in >90% of samples
dds<-DESeqDataSetFromMatrix(df,colData =group, design = ~group)
dds <- estimateSizeFactors(dds)
badgenes<-names(which(apply(counts(dds), 1, function(x){sum(x < 5)}) > 0.9 * ncol(dds)))
ddsFiltered <- dds[which(!rownames(dds) %in% badgenes), ]
#perform deseq analysis, prevent deseq from inserting p-adj values which are NA, insert p-adj values, subset all DEGs 
ddsFiltered<-DESeq(ddsFiltered)
res<-results(ddsFiltered, cooksCutoff=FALSE, independentFiltering=FALSE)
filtered<-counts(ddsFiltered) 
filtered<-as.data.frame(filtered)
filtered<-filtered%>%mutate(padj=res$padj)
all_diff_genes <-subset(filtered,filtered$padj<0.05)
#create heatmap with only the DEGs
rld <- vst(ddsFiltered, blind=FALSE)
de<- rownames(res[res$padj<0.05, ])
de_mat <- assay(rld)[de,]
pheatmap(de_mat,show_rownames = F,show_colnames = F,annotation_col =group)

enter image description here

deseq2 heatmap • 6.5k views

ADD COMMENT • link 3.4 years ago by bart ▴ 50

score 1 · Answer 1 · 2022-01-08

1

Entering edit mode

3.4 years ago

ATpoint 88k

This is pretty much the same as explained here: Scaling RNA-Seq data before clustering?

You did not scale the heatmap, hence it clusters by expression level rather than relative differences.

ADD COMMENT • link 3.4 years ago by ATpoint 88k

0

Entering edit mode

Thanks for the response! I scaled the heatmap like so:

rld <- vst(ddsFiltered, blind=FALSE)
de<- rownames(res[res$padj<0.05, ])
de_mat <- assay(rld)[de,]
pheatmap(t(scale(t(de_mat))),show_rownames = F,show_colnames = F,annotation_col =group)

However, the clustering is still very poor using heatmaps or PCA and I dont really know why. It could be that there are confounding factors such as age and gender etc, however when I add these to the formula:

DESeqDataSetFromMatrix(df[,-175],colData =newgroup, design = ~group+sex+age)

and use these in the heatmap function:

pheatmap(t(scale(t(de_mat))),show_rownames = F,show_colnames = F,annotation_col =newgroup)

the amount of DEGs drastically drops from 2000 to 32 and clustering does not improve. NB: group means cancer group (number 1 in heatmap) or no cancer (number 2 in heatmap). So I am out of ideas what could cause the clustering problems. Do you have any ideas what might be the problem here?

enter image description here

ADD REPLY • link 3.4 years ago by bart ▴ 50