I'm using deseq2 for DEA but when I create a heatmap with only DEGs, it is not possible to differentiate between people from 2 different groups (cancer vs controls). I've used the following lines in R:
#create deseq object, normalize, use pre-filtering to remove genes with <5 counts in >90% of samples
dds<-DESeqDataSetFromMatrix(df,colData =group, design = ~group)
dds <- estimateSizeFactors(dds)
badgenes<-names(which(apply(counts(dds), 1, function(x){sum(x < 5)}) > 0.9 * ncol(dds)))
ddsFiltered <- dds[which(!rownames(dds) %in% badgenes), ]
#perform deseq analysis, prevent deseq from inserting p-adj values which are NA, insert p-adj values, subset all DEGs
ddsFiltered<-DESeq(ddsFiltered)
res<-results(ddsFiltered, cooksCutoff=FALSE, independentFiltering=FALSE)
filtered<-counts(ddsFiltered)
filtered<-as.data.frame(filtered)
filtered<-filtered%>%mutate(padj=res$padj)
all_diff_genes <-subset(filtered,filtered$padj<0.05)
#create heatmap with only the DEGs
rld <- vst(ddsFiltered, blind=FALSE)
de<- rownames(res[res$padj<0.05, ])
de_mat <- assay(rld)[de,]
pheatmap(t(scale(t(de_mat))),show_rownames = F,show_colnames = F,annotation_col =group)
I thought that it might be caused by the fact that factors such as age and gender are not present in the design formula, however, adding these does not improve clustering and reduces the amount of DEGs that are found from 2k using only 'group' (=cancer or healthy) to 32 DEGs. Also, clustering does not improve by adding factors. At this point I'm out of ideas of what could cause this clustering problem. Does anyone have an idea? Thanks in advance!
Exactly. Sometimes the results don't support the hypothesis, no matter how much data torturing we apply.