Question

same experiment, different values with my heat map. Any help?

0

Entering edit mode

6.0 years ago

Mozart ▴ 330

Hi there, I am really wrapping my head around a thing that I may have forgotten. Essentially, I have different results (i.e. rld ones) that I will use in my heat map that changes according to the number of samples I consider. I am wondering why this is happening. Given the fact I am sure I haven't explained myself clearly, I will try to paraphrase what I have just said:

I want to generate 2 heat maps: one, from the main comparison I am interested (6 samples) second one, containing results from all samples in my dataset (6 samples as before + 2)

by doing this, I obtain different counts for the same genes in the 2 aforementioned conditions. Is this due to the fact that regularised logarithmic transformation is different according to the number of samples in the dataset?

thanks

RNA-Seq heatmap • 1.7k views

ADD COMMENT • link 4.9 years ago by Mozart ▴ 330

score 3 · Accepted Answer · 2019-05-25

3

Entering edit mode

6.0 years ago

ATpoint 88k

This is normal and expected given that normalization factors and model fitting will be different if you add or subtract samples. If you want to be independent of that, maybe use something like log2(FPKM+1). For visualization alone this is probably accurate enough. What do you want to show with the heatmaps?

ADD COMMENT • link 6.0 years ago by ATpoint 88k

0

Entering edit mode

Thanks for the quick reply. I've always had this feeling! I just want to show the top variable genes in my dataset...that's it.

Another question: should I stick with the same kind of log transformation (either vst or rlog) for all of the plots in my experiment or can I change the normalisation method each time (e.g. rlog for PCA and vst for heat map?)..thanks!

ADD REPLY • link 6.0 years ago by Mozart ▴ 330

2

Entering edit mode

I would not switch around as there should be consistency. Use what you prefer (or vst if you have many samples and rlog is too slow) but do not mix at will as they behave quite differently especially for variable genes with low counts.

Alternatively, what I personally find more meaningful is to show only those genes that are significantly different as high variability often comes from the mean-variance dependency for low-count genes. You could show the z-scored log2FCs for those with padj < 0.05. Still, if you prefer counts do not mix methods and be consistent.