Question

What is the most suitable normalisation technique for RNASeq data?

1

Entering edit mode

5.6 years ago

J. Smith ▴ 90

Hi friends,

I am working with RNASeq data and completely new in it. I have already performed differential expression analysis using DESeq2 and identified differentially expressed genes.

Now with the identified differentially expressed genes, I want to perform clustering to identify which genes works in groups for their biological activities.

Please suggest me

1) Which normalization technique is the most suitable right now for RNASeq data along with links/code for performing that normalisation technique.

2) Is log2 transformation is to be accompanied with the normalisation as it was the case in Microarray analysis?

3) In DESeq2 tutorials, there is vst and rlog transformation, is it necessary to perform any one of those along with other normalization / preprocessing steps?

3) Please provide me link for tutorials or code for performing the most suitable normalisation technique and any other necessary steps (if any) before I can apply any clustering techniques like hierarchical clustering with those preprocessed gene expressions of the differentially expressed genes identified using DESeq2.

4) Whether there is any difference in Differential Expression analysis techniques and Normalization procedure for single cell RNASeq (scRNASeq) data and RNASeq data?

5) How do I ensure whether a NCBI geo series data is just RNASeq or scRNASeq? I mean is there any distinct text written in NCBI all accession series home page that it's RNASeq or scRNASeq?

Thanks in advance.

rna-seq Deseq2 Normalization • 3.0k views

ADD COMMENT • link 5.6 years ago by J. Smith ▴ 90

score 2 · Answer 1 · 2019-06-05

2

Entering edit mode

5.6 years ago

ATpoint 86k

Both vst and rlog are recommended for downstream applications such as clustering. From what I understand rlog is on log2-scale and vst is approximetely log2. All code you need is in the DESeq2manual at Bioconductor. It is quite extensive, please go through it. In the case you do not want to use any of it, log2-transformation of the normalized counts log2(counts(dds, normalized=T)+1) would be another option (prior count of +1 to avoid logging zeros which is Inf) but the first two are recommended by the author. rlog is computationally-intensive and therefore slow and for this reason only suitable if sample numbers are moderate so < 10 or so, check the manual, it also covers that.

Edit: Please note that within the default DESeq2 workflow (so running DESeq()) the function internally normalizes the counts using the geometric mean approach (see here), which is why you can access normalized counts from the dds object with the code mentioned above.

ADD COMMENT • link 5.6 years ago by ATpoint 86k

1

Entering edit mode

+1 from me. An essential step is the inter-library normalization done via the normalized=T. You can read more about why that is needed here.

ADD REPLY • link 5.6 years ago by Kristoffer Vitting-Seerup ★ 4.1k

0

Entering edit mode

Thanks a lot friend.

ADD REPLY • link 5.6 years ago by J. Smith ▴ 90

0

Entering edit mode

Thanks for your response. I have read vst and rlog. Actually, I want to know whether any other better normalisation techniques available than using vst or rlog? Probably vst and rlog techniques were proposed quite a few years back. I want to know whether any recent better technique available or not.

ADD REPLY • link 5.6 years ago by J. Smith ▴ 90

1

Entering edit mode

They will do just fine and are established. They offer normalization for depth and library composition, stabilize varianze and rlog even does shrinkage to limit the variability of lowly-expressed genes. Do not think too much ;-) (really, this is not meant to be offensive) but use these established techniques which are recommended by the DESeq2 author(s) who are among the top guys when it comes to expertise in NGS data analysis and proceed with your analysis.

ADD REPLY • link 5.6 years ago by ATpoint 86k

0

Entering edit mode

Thanks a lot... One more question, whether there is any difference in Differential Expression analysis techniques and Normalization procedure for single cell RNASeq (scRNASeq) data and RNASeq data? And how do I ensure whether a NCBI geo series data is just RNASeq or scRNASeq? I mean is there any distinct text written in NCBI all accession series home page that it's RNASeq or scRNASeq?

I am editing the set of questions above also for better visibility of questions.

ADD REPLY • link 5.6 years ago by J. Smith ▴ 90

1

Entering edit mode

There is definitely a difference because scRNA-seq data have in general lower coverage and are zero-inflated. There is extensive literature on this out there comparing different techniques for scRNA-seq. maybe the ZINBWAVE paper is a good start. If it is scRNA-seq or not you can see when looking at the methods description.

ADD REPLY • link 5.6 years ago by ATpoint 86k

0

Entering edit mode

Thanks a lot again for your extensive help. Sorry for asking silly question again as I am new to this area. Whether in NCBI for all accession series home page, is there any distinct text written to mention it's RNASeq or scRNASeq?

ADD REPLY • link 5.6 years ago by J. Smith ▴ 90

1

Entering edit mode

Don't worry. I keep asking the folks at our Biostars slack biostar.slack.com: Chat for the biostars community questions all the time. This is the way to improve yourself :)

ADD REPLY • link 5.6 years ago by ATpoint 86k