Differential gene expression (aiming to create heatmap) for raw RNAseq count data from different publications - how to?
1
0
Entering edit mode
3 months ago
Leah • 0

I'm new to bioinformatics and RNAseq analysis, I'll try my best to explain the question!

I have raw counts RNAseq data from 3 different publications (so 3 different datasets) that all had similar methods to produce their data. I will be comparing the gene exp of Cell_type_A vs Cell_type_B in each of the datasets, so I will be doing differential gene expression on these 2 cell types. The 2 cell types will be my "conditions", if that makes sense. To do this, I used R to apply DESeq2 on each dataset (independently) to get the log2fold change of each gene for the differential expression of Cell_type_A vs Cell_type_B. I filtered for genes with p-adj < 0.05. Now that I have significant log2fold changes for each dataset, how can I compare these results in a heatmap? The type of heatmap I'm going for is 3 columns: one for each dataset. The heatmap will just display the log2foldchange values, like how a volcano plot would. Positive values (upregulated) = red, negative values (downregulated) = blue.

I was thinking of making a new dataset with 4 columns: gene, log2fs of dataset 1, log2fs of dataset 2, log2fs of dataset 3. And from there I can make a heatmap out of the new dataset: so my heatmap will have 3 columns of data and the genes as the row labels. But that seems like a cheap solution to me. Is it a valid way of going through with this? I really appreciate all kinds of help, I've been stuck on this problem for quite a while!

R DESeq2 Differential-expression • 599 views
ADD COMMENT
0
Entering edit mode

You can try this online tool to create a heatmap for any data : https://cparsania.shinyapps.io/FungiExpresZ/

ADD REPLY
0
Entering edit mode

But that seems like a cheap solution to me

How does it feel cheap? Seems reasonable to me. You may need to set geneIds as the row names of the dataframe/matrix depending on the tool and you may need to call it as a matrix.

here's a possible example

library(pheatmap)

# create df of log2FCs by subsetting columns from main deg table.
degs_heatmap <- degs[,c("geneId", "log2FC_1", "log2FC_2", "log2FC_3")]

#set geneIDs as rownames and remove geneId column
rownames(degs_heatmap) <- degs_heatmap$geneId
degs_heatmap$geneId <- NULL

#create pheatmap, keep columns in order but cluster rows for bettr readability/analysis. No scaling since log2FCs are used    
pheatmap(degs_heatmap, cluster_row = TRUE, cluster_col = FALSE, scale = "none")
ADD REPLY
0
Entering edit mode
3 months ago
BioinfGuru ★ 2.1k

how can I compare these results in a heatmap?

You can't: I think I understand what you are trying to get.... a nice heatmap with nice blocks of color to clearly show the difference between datasets. The problem is this: You can't compare across datasets if you have not normalised across datasets... which you haven't done because you analysed the 3 independently. The comparison is meaningless because the heatmaps will actually only tell you the comparison of dataset+dataset biological variation + dataset technical variation. You haven't removed the biological or technical variation across datasets by normalising across datasets. Comparing within datasets is ok...you've done that... but you cant compare the datasets to each other.

I will be comparing the gene exp of Cell_type_A vs Cell_type_B in each of the datasets

You've done this.

If you want to compare differential gene expression across datasets a better design and workflow would be:

sample dataset cell_type
1      ds1     A
2      ds1     A
3      ds1     B
4      ds1     B
5      ds2     A
6      ds2     A
7      ds2     B
8      ds2     B
9      ds3     A
10     ds3     A
11     ds3     B
12     ds3     B

Then get a single quantification matrix, and tell deseq2 to treat "dataset" as a batch effect so the design would be:

~ dataset + cell_type + dataset:cell_type

Then you can use results() to:

  • compare cell types
  • compare how dataset affects differential expression

Even better would be to run SVA/RUV on the count matrix before deseq2 to model hidden technical sources of variation.

(wow that took alot of editing lol)

ADD COMMENT
0
Entering edit mode

You can't: I think I understand what you are trying to get.... a nice heatmap with nice blocks of color to clearly show the difference between datasets. The problem is this...

I'm not sure I agree. If the cell types A and B are consistent within each dataset, why not compare log2FCs between datasets? For example, if I compare the same conditions, but a polyA-selected vs ribo-depleted library, I get results that make more sense by analyzing the datasets independently and comparing DEGs rather than in a blocked design where batch effects are too great to overcome. I could be wrong.

ADD REPLY
0
Entering edit mode

polyA-selected vs ribo-depleted library,

Would that be a confounder though? I don't know, maybe that is an example of where this might work but I would think there may be many examples where it won't, and the only way to protect against that would be to normalise across samples.

What if unknown to OP, datasets A + B were all male and dataset C was all female? That would be a hidden biological variable that could vastly affect DEGs in dataset C. Then when comparing A, B, C, there would be no explanation of why dataset see was different. More importantly, it could also lead a person to take a median with intervals for any specific gene or group of genes in a color block and conclude, "this is the average l2fc for this gene between time point x and y" and not realise there is a confounder. My first question to anyone presenting results of the comparison would be "how do you justify not normalising across the datasets?"

I only weigh in because, I am in a similar situation having to compare the DEGs of independently analysed tissue samples and I've been advised not to analyse them together in the same deseq2 object.

ADD REPLY

Login before adding your answer.

Traffic: 1873 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6