Hey,
So have two conditions four samples, and after running DESeq2 I get a nice heatmap, but the aveExperssion is somewhat troubling.
I have used the following code but i am not sure if there is something that I messed or if the data is somewhat not ok..
txi <- tximport(files, type = "kallisto", tx2gene = tx2gene, ignoreTxVersion = T)
library(tidyverse)
library(kableExtra)
library(DESeq2)
library(pheatmap)
library(RColorBrewer)
library(data.table)
library(DT)
library(Biostrings)
library(dplyr)
sampleTable <- data.frame(condition = factor(rep(c("Primary", "Cultured"), each = 2)))
rownames(sampleTable) <- colnames(txi$counts)
rownames(sampleTable) -> sampleTable$sample
sampleTable$sample <- c("D1.1", "D1.2", "D5.1", "D5.2")
dds <- DESeqDataSetFromTximport(txi, sampleTable, ~condition)
dds <- dds[rowSums(counts(dds)) > 100,]
dds$condition <- relevel(dds$condition, ref = "Primary")
dds <- DESeq(dds)
resLFC <- lfcShrink(dds, coef = "condition_Cultured_vs_Primary", type = "apeglm")
sigs <- na.omit(resLFC)
sigs <- sigs[sigs$padj < 0.005,]
df <- as.data.frame(sigs)
library(biomaRt)
ensembl <- useMart("ensembl", dataset = "mmusculus_gene_ensembl")
gene_mapping <- getBM(attributes = c("ensembl_gene_id", "external_gene_name"),
filters = "ensembl_gene_id",
values = rownames(sigs),
mart = ensembl)
sigsdf <- merge(df, gene_mapping, by.x = "row.names", by.y = "ensembl_gene_id")
df.top <- sigsdf[ (sigsdf$baseMean > 25) & (abs(sigsdf$log2FoldChange) > 2),]
df.top <- df.top[order(df.top$log2FoldChange, decreasing = TRUE),]
rownames(df.top) <- df.top$Row.names
rlog_out <- rlog(dds, blind=FALSE)
mat<-assay(rlog_out)[rownames(df.top), rownames(sampleTable)] #sig genes x samples
colnames(mat) <- sampleTable$sample
base_mean <- rowMeans(mat)
mat.scaled <- t(apply(mat, 1, scale)) #center and scale each column (Z-score) then transpose
colnames(mat.scaled)<-colnames(mat)
num_keep <- 25
rows_keep <- c(seq(1:num_keep), seq((nrow(mat.scaled)-num_keep), nrow(mat.scaled)) )
l2_val <- as.matrix(df.top[rows_keep,]$log2FoldChange) #getting log2 value for each gene we are keeping
colnames(l2_val)<-"logFC"
mean <- as.matrix(df.top[rows_keep,]$baseMean) #getting mean value for each gene we are keeping
colnames(mean)<-"AveExpr"
col_logFC <- colorRamp2(c(min(l2_val),0, max(l2_val)), c("blue", "white", "red"))
col_AveExpr <- colorRamp2(c(quantile(mean)[1], quantile(mean)[4]), c("white", "red"))
ha <- HeatmapAnnotation(summary = anno_summary(gp = gpar(fill = 2),
height = unit(2, "cm")))
h1 <- Heatmap(mat.scaled[rows_keep,], cluster_rows = F,
column_labels = colnames(mat.scaled), name="Z-score",
cluster_columns = T)
h2 <- Heatmap(l2_val, row_labels = df.top$external_gene_name[rows_keep],
cluster_rows = F, name="logFC", top_annotation = ha, col = col_logFC,
cell_fun = function(j, i, x, y, w, h, col) { # add text to each grid
grid.text(round(l2_val[i, j],2), x, y)
})
h3 <- Heatmap(mean, row_labels = df.top$external_gene_name[rows_keep],
cluster_rows = F, name = "AveExpr", col=col_AveExpr,
cell_fun = function(j, i, x, y, w, h, col) { # add text to each grid
grid.text(round(mean[i, j],2), x, y)
})
h<-h1+h2+h3
h
What exactly is your question here? With no real idea of your experimental design, it's difficult to note if the large expression changes are due to any issue or just a reflection of biology.
D1 are primary cells isolated from the colon and D5 are the same cells but expanded in vitro for 5 weeks. How can be sure that these DEGs represent the changes between the two time points and not something with the dataset itself?
Off-hand, I'd say this is what you'd expect if you are looking at two totally different tissues. Two of them are clearly pancreas, and I guess it makes sense that they'd be making tons and tons of the mRNAs for secreted enzymes.
I think these are stromal cells from the colon - I am not sure thu about the stromal part. D1 is freshly isolated cells and D5 is cells expanded in culture
They are your samples; all I can do is guess...but people will sometimes give you wrong information. You have the ground truth, you should determine for yourself. Look at the genes, do they look like colon genes? Every one of those genes on the bottom half that I google turn up as pancreas genes. (Not the HBA's those are hemoglobin)