Question

Normalisation of RNA-seq for different RNA biotypes

0

Entering edit mode

8.4 years ago

Nathaniel ▴ 120

I have RNA-seq count data and I want to perform traditional analysis like PCA/heatmaps on each RNA biotype independently.

My question is: when performing the RNA normalisation step (using for instance DESeq) should I do it on the entire gene expression matrix with all biotypes concatenated, or is better if I do it separately for each gene biotype?

What worries me is that if we do the processing in the concatenated expresion matrix, the mRNA will completely mask the miRNAs, since they are much more expressed. Also, if i then filter let's say the 10% less expressed genes I will probably end up filtering more miRNAs than mRNAs for the same reason.

Thanks.

rna-seq deseq • 2.0k views

ADD COMMENT • link updated 8.4 years ago by Devon Ryan 105k • written 8.4 years ago by Nathaniel ▴ 120

score 1 · Answer 1 · 2016-12-10

1

Entering edit mode

8.4 years ago

Devon Ryan 105k

Estimate the size factors and dispersion using the whole dataset and then subset for further comparisons. Your miRNAs won't be masked then or filtered out. Having said that, if you did RNAseq then you're going to get largely meaningless miRNA counts. I would strongly encourage you to simply ignore them. If you want to measure miRNA differences, sequence small RNAs. Otherwise you're just measuring differences in size selection during library prep between the groups...

ADD COMMENT • link 8.4 years ago by Devon Ryan 105k

0

Entering edit mode

Great, thank you! does the same apply for long RNAs such as lincRNAs? Those should be safe to analyse, right?