Hi all,
I am working with RNA-Seq data and need to control for a variable which can be measured via HOX gene expression. The study is a simple case/control study with a treated and untreated condition.
I have performed some clustering on the dds object (code below) and generated a correlation heatmap of HOX gene expression. To my mind, there are roughly 4 clusters. The labels aren't clear, but all of my controls seem to fall into a single cluster (they have shorter names than the cases).
When I assign cluster membership to these samples in the metadata, all of the untreated controls fall into cluster 1, but the treated controls seem to be split across clusters 2-3. This doesn't seem to bear any resemblance to what is shown on the heatmap, nor does it really make biological sense.
Would you have any insights? Is there a better way to extract clusters from the dendrogram?
Many thanks,
Rob.
Code Below (All code written in Rstudio):
dds <- DESeqDataSetFromMatrix(counts, meta, design = ~condition)
dds <- estimateSizeFactors(dds)
Performing HOX Clustering (HOXA10-HOXA9 removed as there were 0 counts)
vst_dds <- vst(dds, blind=TRUE)
vst_mat <- assay(vst_dds)
vst_mat_hox <- vst_mat[grepl("^HOX", row.names(vst_mat)), ]
vst_mat_hox <- vst_mat_hox[row.names(vst_mat_hox) != "HOXA10-HOXA9", ]
vst_cor_hox <- cor(vst_mat_hox)
Generate the clustering heatmap
library(pheatmap)
hox_heatmap <- pheatmap(vst_cor_hox)
Assign clusters to the metadata
meta$hox_group <- cutree(as.hclust(hox_heatmap$tree_row), k = 4)
meta$hox_group <- as.factor(meta$hox_group)
Example of HOX gene group membership of treated controls