DESeq2 heatmap with low baseMean values
1
0
Entering edit mode
11 months ago
Ahmed ▴ 10

Hey,

So have two conditions four samples, and after running DESeq2 I get a nice heatmap, but the aveExperssion is somewhat troubling.

Heatmap

plotDispEsts

plotMA for resLFC

I have used the following code but i am not sure if there is something that I messed or if the data is somewhat not ok..

txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = T)
library(tidyverse)
library(kableExtra)
library(DESeq2)
library(pheatmap)
library(RColorBrewer)
library(data.table)
library(DT)
library(Biostrings)
library(dplyr)

sampleTable <- data.frame(condition = factor(rep(c("Primary", "Cultured"), each = 2)))
rownames(sampleTable) <- colnames(txi$counts)
rownames(sampleTable) -> sampleTable$sample
sampleTable$sample <- c("D1.1", "D1.2", "D5.1", "D5.2")

dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
dds <- dds[rowSums(counts(dds)) > 100,] 
dds$condition <- relevel(dds$condition, ref = "Primary")

dds <- DESeq(dds)
resLFC <- lfcShrink(dds, coef = "condition_Cultured_vs_Primary", type = "apeglm")
sigs <- na.omit(resLFC)
sigs <- sigs[sigs$padj < 0.005,]

df <- as.data.frame(sigs)
library(biomaRt)
ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
gene_mapping <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
                      filters = "ensembl_gene_id",
                      values = rownames(sigs),
                      mart = ensembl)
sigsdf <- merge(df, gene_mapping, by.x = "row.names", by.y = "ensembl_gene_id")
df.top <- sigsdf[ (sigsdf$baseMean > 25) & (abs(sigsdf$log2FoldChange) > 2),]
df.top <- df.top[order(df.top$log2FoldChange, decreasing = TRUE),]
rownames(df.top) <- df.top$Row.names


rlog_out <- rlog(dds, blind=FALSE)
mat<-assay(rlog_out)[rownames(df.top), rownames(sampleTable)] #sig genes x samples
colnames(mat) <- sampleTable$sample
base_mean <- rowMeans(mat)
mat.scaled <- t(apply(mat, 1, scale)) #center and scale each column (Z-score) then transpose
colnames(mat.scaled)<-colnames(mat)
num_keep <- 25
rows_keep <- c(seq(1:num_keep), seq((nrow(mat.scaled)-num_keep), nrow(mat.scaled)) )
l2_val <- as.matrix(df.top[rows_keep,]$log2FoldChange) #getting log2 value for each gene we are keeping
colnames(l2_val)<-"logFC"

mean <- as.matrix(df.top[rows_keep,]$baseMean) #getting mean value for each gene we are keeping
colnames(mean)<-"AveExpr"


col_logFC <- colorRamp2(c(min(l2_val),0, max(l2_val)), c("blue", "white", "red")) 
col_AveExpr <- colorRamp2(c(quantile(mean)[1], quantile(mean)[4]), c("white", "red"))

ha <- HeatmapAnnotation(summary = anno_summary(gp = gpar(fill = 2), 
                                               height = unit(2, "cm")))

h1 <- Heatmap(mat.scaled[rows_keep,], cluster_rows = F, 
            column_labels = colnames(mat.scaled), name="Z-score",
            cluster_columns = T)
h2 <- Heatmap(l2_val, row_labels = df.top$external_gene_name[rows_keep], 
            cluster_rows = F, name="logFC", top_annotation = ha, col = col_logFC,
            cell_fun = function(j, i, x, y, w, h, col) { # add text to each grid
              grid.text(round(l2_val[i, j],2), x, y)
            })
h3 <- Heatmap(mean, row_labels = df.top$external_gene_name[rows_keep], 
            cluster_rows = F, name = "AveExpr", col=col_AveExpr,
            cell_fun = function(j, i, x, y, w, h, col) { # add text to each grid
              grid.text(round(mean[i, j],2), x, y)
            })

h<-h1+h2+h3
h
DESeq2 • 1.2k views
ADD COMMENT
1
Entering edit mode

What exactly is your question here? With no real idea of your experimental design, it's difficult to note if the large expression changes are due to any issue or just a reflection of biology.

ADD REPLY
0
Entering edit mode

D1 are primary cells isolated from the colon and D5 are the same cells but expanded in vitro for 5 weeks. How can be sure that these DEGs represent the changes between the two time points and not something with the dataset itself?

ADD REPLY
0
Entering edit mode

Off-hand, I'd say this is what you'd expect if you are looking at two totally different tissues. Two of them are clearly pancreas, and I guess it makes sense that they'd be making tons and tons of the mRNAs for secreted enzymes.

ADD REPLY
0
Entering edit mode

I think these are stromal cells from the colon - I am not sure thu about the stromal part. D1 is freshly isolated cells and D5 is cells expanded in culture

ADD REPLY
0
Entering edit mode

They are your samples; all I can do is guess...but people will sometimes give you wrong information. You have the ground truth, you should determine for yourself. Look at the genes, do they look like colon genes? Every one of those genes on the bottom half that I google turn up as pancreas genes. (Not the HBA's those are hemoglobin)

ADD REPLY
1
Entering edit mode
11 months ago
ATpoint 86k

I would suggest doing some proper prefiltering. Many fold changes in the range of 10-20 often indicate spurious DEG calls due to outliers. That is these "wings" in the MA-plot on the top and bottom. Lowly-expressed genes with few outliers are prone to show inflated logFCs.

Your prefilter dds[rowSums(counts(dds)) > 100,] is essentially not doing much other than removing genes with no counts because it filters for genes where across all samples there is at least a sum of 100 counts. Imagine you have 20 samples and one has an outlier count of 1000, but all others are zero. The filter retains this but it should be excluded. I always recommend edgeR::filterByExpr. I suggest for the heatmap to also use log2 baseMean so the color code is not dominated by large values.

ADD COMMENT
1
Entering edit mode

Just to add, the DESeq2 recommended prefiltering adds an extra term: rowSums(counts(dds) >= 10) >= smallestGroupSize

If OP wanted to keep a cutoff of 100, it would look like dds[rowSums(counts(dds) >= 100) >= 2,]

ADD REPLY
1
Entering edit mode

Thanks alot, did it and the heat map is way better now enter image description here

ADD REPLY

Login before adding your answer.

Traffic: 1416 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6