Dear all,
I know it is unorthodox, but I want to explore the performance of using a panel of "reference genes" for normalization of RNASeq gene counts (I am doing this alongside a more traditional normalization method in my dataset). To do this:
1. I wonder if I can simply divide each gene's raw counts to the ratio of the reference gene (see below)? Or should I log2 transform all the counts first?
Conversion_Rate = Ref_Count_Sample_A / Ref_Count_Sample_B
Normalized_GeneX_SampleA = GeneX_Counts_SampleA / Conversion_Rate
2. If I were to combine counts from several reference genes for normalization, would it make sense to use arithmetic mean or geometric mean for combination of the reference counts?
Thank you for sharing your thoughts in advance,
Hi, of course you could do this but you are also likely to shoot yourself in the foot. The reason is that you would need a perfect housekeeping gene that never changes as one uses with qPCR. Of course you cannot know this beforehand. In qPCR you have no choice but with RNAseq you can employ the favors of the large number of genes sampled.
I am trying normalization with "Pseudo Reference" method to identify the least variable genes within each dataset to be used as reference genes and then find shared genes across multiple sets. So far within the same dataset the normalization method by a set of reference genes looks comparable with those of "Pseudo Reference" method.