Entering edit mode
3.0 years ago
bart
▴
50
I'm using deseq2 for DEA but when I create a heatmap with only DEGs, it looks very strange: I'm not sure whether there are only overexpressed genes or whether the dataset is not normalized properly. I probably made a mistake somewhere in my coding but I don't know where to look. Help would be appreciated!
#create deseq object, normalize, use pre-filtering to remove genes with <5 counts in >90% of samples
dds<-DESeqDataSetFromMatrix(df,colData =group, design = ~group)
dds <- estimateSizeFactors(dds)
badgenes<-names(which(apply(counts(dds), 1, function(x){sum(x < 5)}) > 0.9 * ncol(dds)))
ddsFiltered <- dds[which(!rownames(dds) %in% badgenes), ]
#perform deseq analysis, prevent deseq from inserting p-adj values which are NA, insert p-adj values, subset all DEGs
ddsFiltered<-DESeq(ddsFiltered)
res<-results(ddsFiltered, cooksCutoff=FALSE, independentFiltering=FALSE)
filtered<-counts(ddsFiltered)
filtered<-as.data.frame(filtered)
filtered<-filtered%>%mutate(padj=res$padj)
all_diff_genes <-subset(filtered,filtered$padj<0.05)
#create heatmap with only the DEGs
rld <- vst(ddsFiltered, blind=FALSE)
de<- rownames(res[res$padj<0.05, ])
de_mat <- assay(rld)[de,]
pheatmap(de_mat,show_rownames = F,show_colnames = F,annotation_col =group)
Thanks for the response! I scaled the heatmap like so:
However, the clustering is still very poor using heatmaps or PCA and I dont really know why. It could be that there are confounding factors such as age and gender etc, however when I add these to the formula:
and use these in the heatmap function:
the amount of DEGs drastically drops from 2000 to 32 and clustering does not improve. NB: group means cancer group (number 1 in heatmap) or no cancer (number 2 in heatmap). So I am out of ideas what could cause the clustering problems. Do you have any ideas what might be the problem here?