Question

Differential gene expression (aiming to create heatmap) for raw RNAseq count data from different publications - how to?

0

Entering edit mode

3 months ago

Leah • 0

I'm new to bioinformatics and RNAseq analysis, I'll try my best to explain the question!

I have raw counts RNAseq data from 3 different publications (so 3 different datasets) that all had similar methods to produce their data. I will be comparing the gene exp of Cell_type_A vs Cell_type_B in each of the datasets, so I will be doing differential gene expression on these 2 cell types. The 2 cell types will be my "conditions", if that makes sense. To do this, I used R to apply DESeq2 on each dataset (independently) to get the log2fold change of each gene for the differential expression of Cell_type_A vs Cell_type_B. I filtered for genes with p-adj < 0.05. Now that I have significant log2fold changes for each dataset, how can I compare these results in a heatmap? The type of heatmap I'm going for is 3 columns: one for each dataset. The heatmap will just display the log2foldchange values, like how a volcano plot would. Positive values (upregulated) = red, negative values (downregulated) = blue.

I was thinking of making a new dataset with 4 columns: gene, log2fs of dataset 1, log2fs of dataset 2, log2fs of dataset 3. And from there I can make a heatmap out of the new dataset: so my heatmap will have 3 columns of data and the genes as the row labels. But that seems like a cheap solution to me. Is it a valid way of going through with this? I really appreciate all kinds of help, I've been stuck on this problem for quite a while!

R DESeq2 Differential-expression • 599 views

ADD COMMENT • link updated 3 months ago by BioinfGuru ★ 2.1k • written 3 months ago by Leah • 0

0

Entering edit mode

You can try this online tool to create a heatmap for any data : https://cparsania.shinyapps.io/FungiExpresZ/

ADD REPLY • link 3 months ago by Chirag Parsania ★ 2.0k

0

Entering edit mode

But that seems like a cheap solution to me

How does it feel cheap? Seems reasonable to me. You may need to set geneIds as the row names of the dataframe/matrix depending on the tool and you may need to call it as a matrix.

here's a possible example

library(pheatmap)

# create df of log2FCs by subsetting columns from main deg table.
degs_heatmap <- degs[,c("geneId", "log2FC_1", "log2FC_2", "log2FC_3")]

#set geneIDs as rownames and remove geneId column
rownames(degs_heatmap) <- degs_heatmap$geneId
degs_heatmap$geneId <- NULL

#create pheatmap, keep columns in order but cluster rows for bettr readability/analysis. No scaling since log2FCs are used    
pheatmap(degs_heatmap, cluster_row = TRUE, cluster_col = FALSE, scale = "none")

ADD REPLY • link 3 months ago by rfran010 ★ 1.3k

score 0 · Answer 1 · 2024-08-15

how can I compare these results in a heatmap?

You can't: I think I understand what you are trying to get.... a nice heatmap with nice blocks of color to clearly show the difference between datasets. The problem is this: You can't compare across datasets if you have not normalised across datasets... which you haven't done because you analysed the 3 independently. The comparison is meaningless because the heatmaps will actually only tell you the comparison of dataset+dataset biological variation + dataset technical variation. You haven't removed the biological or technical variation across datasets by normalising across datasets. Comparing within datasets is ok...you've done that... but you cant compare the datasets to each other.

I will be comparing the gene exp of Cell_type_A vs Cell_type_B in each of the datasets

You've done this.

If you want to compare differential gene expression across datasets a better design and workflow would be:

sample dataset cell_type
1      ds1     A
2      ds1     A
3      ds1     B
4      ds1     B
5      ds2     A
6      ds2     A
7      ds2     B
8      ds2     B
9      ds3     A
10     ds3     A
11     ds3     B
12     ds3     B

Then get a single quantification matrix, and tell deseq2 to treat "dataset" as a batch effect so the design would be:

~ dataset + cell_type + dataset:cell_type

Then you can use results() to:

compare cell types
compare how dataset affects differential expression

Even better would be to run SVA/RUV on the count matrix before deseq2 to model hidden technical sources of variation.

(wow that took alot of editing lol)