Question

heatmaps for limma-voom DE genes

1

Entering edit mode

3.4 years ago

AlexStar ▴ 180

Hi, I concducted a differential expression analysis with limma - voom for RNA-seq data. Starting with a raw RNA-seq expression data, this is my code:

de <- DGEList(raw)
de <- calcNormFactors(de)

# Filter low-expressed genes
cutoff = 1
drop <- which(apply(cpm(de), 1, max) < cutoff)
de <- de[-drop,] 

design <- model.matrix(~Response, metadata)
contrast <- matrix(c(1 ,0), ncol = 1)
dimnames(contrast) <- list(c('Response', 'No Response'), 'Diff')

Voom <- voom(de, design, plot = TRUE) 
vfit <- lmFit(Voom, design) 
vfit  <- contrasts.fit(vfit, contrasts = contrast) 
efit<- eBayes(vfit) moderation

summary(decideTests(efit15))
deg <- topTable(efit, coef = 'Diff', p.value = 0.05, adjust.method = 'fdr',
                  number = Inf)

Now, I want to do a heatmap for the top 15 up genes and the 15 down genes, this is the code I'm using:

rownames(deg) = na.omit(rownames(deg))
deg <- deg[!is.na(rownames(deg)), ]
deg$symbol <- rownames(deg)


selectUp <- deg$symbol[deg$logFC > 0][1:15]
selectDown <- deg$symbol[deg$logFC < 0][1:15]
select = c(selectUp,selectDown)
heatmap_dataframe <- data.frame(row.names = colnames(raw15),
                   outcome = metadata$Response)

normcounts = assay(vst(raw,blind=T))  ## What data should I use for the VST ?

pheatmap(normcounts[select,], cluster_rows=TRUE,
         show_colnames = FALSE,cluster_cols=TRUE, 
         annotation_col= heatmap_dataframe, scale = 'row',cutree_cols = 2,cutree_rows = 2)

I just dont know what data should I use? is it de ? is it Voom ? or the raw ? I tried doing DESeq2::vst for the raw data but that doesn't work.. so which data should I use ? is my code alright ?

heatmap r limma • 3.1k views

ADD COMMENT • link updated 3.4 years ago by ATpoint 88k • written 3.4 years ago by AlexStar ▴ 180

score 1 · Answer 1 · 2022-02-18

1

Entering edit mode

3.4 years ago

ATpoint 88k

For heatmaps showing differences in expression one usually uses the relative expression data, and that means the scaled normalized counts on log2 scale, e.g.:

logcpm <- edgeR::cpm(y, log=TRUE)
scaled <- t(scale(t(logcpm)))

...and then subset that object for your DE genes.

ADD COMMENT • link 3.4 years ago by ATpoint 88k

0

Entering edit mode

Can you please use the object names I used ? because I don't understand which is which.. what is y in your example ?

ADD REPLY • link 3.4 years ago by AlexStar ▴ 180

0

Entering edit mode

It's the de after running the normalization and filtering. y it that DGEList if you look into the edgeR documentation, sorry about that.

ADD REPLY • link 3.4 years ago by ATpoint 88k

score 1 · Answer 2 · 2022-02-18

1

Entering edit mode

3.4 years ago

Gordon Smyth ★ 8.2k

You could simply use

coolmap(Voom[select,])

Or alternatively see the example in the RNAseq123 workflow, which uses cpm() as in ATpoints' answer.

ADD COMMENT • link 3.4 years ago by Gordon Smyth ★ 8.2k