Question

How to normalize my rna seq data?

0

Entering edit mode

5.6 years ago

John ▴ 270

Hi

I have a RNA seq datasets with three conditions (control, treatment X and Treatment Y), all triplicated. RNA sampled from brain tissue, ribosomal pulldown. I got expected counts from RSEM (STAR for alignment). I performed quantile normalization using normalizeBetweenArrays() function from Limma. I am not sure its the best way to normalize my data. You can see (image 1 )treatment Y-3 boxed area has higher gene expression than any other dataset, it looks so weird. I don't know what else I can do. Please help!

Thanks in advance!

enter image description here

RNA-Seq Normalization • 15k views

ADD COMMENT • link 5.6 years ago by John ▴ 270

1

Entering edit mode

Hi, Google Drive is not a recommended host for images as it doesn't support embedding on biostars. Could I trouble you to please follow this guide and upload on imgbb?

ADD REPLY • link updated 5.6 years ago by ATpoint 88k • written 5.6 years ago by Ram 45k

1

Entering edit mode

I made the changes for you already. You have to use the image button and paste in the full link to the image including the suffix (.png or similar). In this case the link would be https://i.ibb.co/d7QHxTW/Screen-Shot-2019-09-24-at-7-52-23-PM.png

ADD REPLY • link 5.6 years ago by ATpoint 88k

1

Entering edit mode

Thank you, I am trying to add another box plot image

ADD REPLY • link 5.6 years ago by John ▴ 270

1

Entering edit mode

So you have RNA-seq, and you use normalizeBetweenArrays()? RNA-seq requires a different analysis than a microarray. Please follow a well-tested tutorial, like this one from bioconductor.

ADD REPLY • link 5.6 years ago by WouterDeCoster 48k

0

Entering edit mode

You could use voom normalization from limma, and add the quantile normalization in there with argument normalize.method = "quantile". However, start with real counts, derived from featureCounts instead of RSEM.

ADD REPLY • link 5.6 years ago by Benn 8.4k

score 3 · Answer 1 · 2019-09-25

3

Entering edit mode

5.6 years ago

ATpoint 88k

Hi Jon,

as WouterDeCoster says QN might be possible for RNA-seq but is not common. I suggest reading the manuals of e.g. edgeR and DESeq2 to learn about normalization. Aditionally check the videos linked below which nicely explain the normalization techniques that are part of the differential pipeline of these two tools. Beyond that DESeq2 offers two functions, vst and rlog that not only normalize counts with respect to library size and composition but also try to unlock the variance dependency from the mean. If these vocabulary are new to you search around in the web, there is plenty of forum and blog entries on normalization and RNA-seq available. I suggest you use one of the mentioned packages for differential analysis (normalization will be taken care of internally) and vst for everything else (e.g. clustering/PCA). Note that both rlog and vst return log2 scaled counts, check the manuals and vignettes.

In order to check normalization efficiency I would also not use Z-scored heatmaps. They are rather uninformative on that matter. Instead use MA-plots (e.g. via the smoothScatter function in R to get areas colored by density or heatscatter from LSD) and then check if the bulk of the data centers somewhat along y=0.

ADD COMMENT • link 5.6 years ago by ATpoint 88k

2

Entering edit mode

Agree on this, sounds completely incorrect to use an array normalization method in sequencing data.

ADD REPLY • link 5.6 years ago by JC 13k

0

Entering edit mode

I am not aware of any DEG method that actually uses it be default. One should check if data fulfills the assumptions to use QN, e.g. via quantro ( https://bioconductor.org/packages/release/bioc/vignettes/quantro/inst/doc/quantro.html ). Anyway, I would not bother with it as the standard methods are well-accepted (TMM/RLE from edgeR/DESeq2)

ADD REPLY • link 5.6 years ago by ATpoint 88k

0

Entering edit mode

Thank you! I knew that edgeR does normalization inside it. So I used edgeR for differential expression analysis. But I wan to show the DE genes in a heatmap, for that How can I normalize for that? (I prefer Z-score of normalized counts as it is he common in all the publication)

ADD REPLY • link 5.6 years ago by John ▴ 270

1

Entering edit mode

Make edgeR return the normalized counts (cpm function which can directly output log2), and then transform to Z-scale, e.g. t(scale(t(norm.count.matrix))). If you use edgeR for DEG I think it is best to use its normalized counts to keep things consistent.

ADD REPLY • link 5.6 years ago by ATpoint 88k