I have RNA-seq count data and I want to perform traditional analysis like PCA/heatmaps on each RNA biotype independently.
My question is: when performing the RNA normalisation step (using for instance DESeq) should I do it on the entire gene expression matrix with all biotypes concatenated, or is better if I do it separately for each gene biotype?
What worries me is that if we do the processing in the concatenated expresion matrix, the mRNA will completely mask the miRNAs, since they are much more expressed. Also, if i then filter let's say the 10% less expressed genes I will probably end up filtering more miRNAs than mRNAs for the same reason.
Thanks.
Great, thank you! does the same apply for long RNAs such as lincRNAs? Those should be safe to analyse, right?
lincRNAs are fine if you did ribo depletion rather than polyA selection.