gene clustering with heatmap -selection of genes of interest
1
0
Entering edit mode
2.8 years ago
Maka ▴ 20

Hi everyone, I am a beginner in RNA-seq data analysis, so I have many doubts I am trying to solve them all step by step. I can't find a proper answer to this one: Which is the best way to select genes of interest for clustering with heatmap? I am using pheatmap package in R after using Deseq2 for DE analysis. I followed the official vignette.

#1
topVarGenes <- head(order(rowVars(assay(rld)), decreasing = TRUE), 100)  #assay(rld): counts transformed using the rlog function 
mat  <- assay(rld)[ topVarGenes, ]
mat  <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row =7)

#2
topVarGenes <- head(order(rowVars(countsdds), decreasing = TRUE), 100) #countsdds: normalized counts using countsdds <- counts(dds, normalized = TRUE)

mat  <- assay(rld)[ topVarGenes, ]
mat  <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row = 7)

#3
topGene <- head(order(rownames(res)[res$log2FoldChange < 1.5 & res$pvalue > 0.05]), decreasing = TRUE, 50)  #select the genes using the results table I got from res <- results(dds...)
mat  <- assay(rld)[ topGene, ]
mat  <- mat - rowMeans(mat)
anno <- as.data.frame(colData(rld)[, c("genotype","condition")])
pheatmap(mat, annotation_col = anno, labels_row = annotRld$mgi_symbol, fontsize_row =8)

is it correct to use rlog transformation? or I need to use normalized counts? Does it make sense to set decreasing = TRUE ? I am not much satisfied with the outputs, so I would really appreciate some explanation or a link where I can take a look at for the best gene clustering ever :) Thank you in advance! Stay safe

rnaseq heatmap deseq2 • 1.1k views
ADD COMMENT
0
Entering edit mode
2.8 years ago
Trivas ★ 1.8k

The first two don't make much sense to do in my mind. Just because certain genes may have the most counts doesn't necessarily mean they're interesting. The 3rd one where you use the results from DESeq2 to select genes would make the most sense to me, although the way you're subsetting doesn't make too much sense without knowing more what your end goals are.

ADD COMMENT
0
Entering edit mode

thanks for the quick reply. my main goal is to understand what is the difference in my ctrl vs my transgenic with a treatment (interaction term). The results res, I used in my third option are related to my main question. I just saw a mistake in my script, I wanted to select res$log2FoldChange > 1.5. it is for extracting the data with major changes.

ADD REPLY

Login before adding your answer.

Traffic: 1035 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6