Hi everyone, I am a beginner in RNA-seq data analysis, so I have many doubts I am trying to solve them all step by step. I can't find a proper answer to this one: Which is the best way to select genes of interest for clustering with heatmap? I am using pheatmap package in R after using Deseq2 for DE analysis. I followed the official vignette.
#1
topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 100) #assay(rld): counts transformed using the rlog function
mat <- assay(rld)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row =7)
#2
topVarGenes <- head(order(rowVars(countsdds), decreasing = TRUE), 100) #countsdds: normalized counts using countsdds <- counts(dds, normalized = TRUE)
mat <- assay(rld)[ topVarGenes, ]
mat <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row = 7)
#3
topGene <- head(order(rownames(res)[res$log2FoldChange < 1.5 & res$pvalue > 0.05]), decreasing = TRUE, 50) #select the genes using the results table I got from res <- results(dds...)
mat <- assay(rld)[ topGene, ]
mat <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row =8)
is it correct to use rlog transformation? or I need to use normalized counts?
Does it make sense to set decreasing = TRUE
?
I am not much satisfied with the outputs, so I would really appreciate some explanation or a link where I can take a look at for the best gene clustering ever :)
Thank you in advance!
Stay safe
thanks for the quick reply. my main goal is to understand what is the difference in my ctrl vs my transgenic with a treatment (interaction term). The results
res
, I used in my third option are related to my main question. I just saw a mistake in my script, I wanted to selectres$log2FoldChange > 1.5
. it is for extracting the data with major changes.