Hi,
Normalization is one of the most important primary steps before running RNA-seq analysis. One of the biases which normalization methods try to solve is the RNA composition bias. I think I am misunderstanding this bias. Based on edgeR manual, it comes from the fact that we have the relative abundance of genes to the total amount of RNA for our sample but not the total amount of RNA in the cell. Therefore, it leads to have some biases for hugely expressed genes. Actually I do not understand why is this so. If a gene is hugely expressed in one sample but not in another it should be reported as DE gene. What is the bias about it?
Thank you in advance.
There are many types of normalization in RNAseq and it's not entirely clear (to me at least) to which one you're referring. From context, I assume you're talking about library size normalization, but perhaps you mean GC content or transcript length normalization (these aren't standard in edgeR). Could you clarify to which you're referring?
No, I did not mean GC content. I think GC content does not affect the results very much as it could be some how the same for all libraries. But i meant the RNA composition bias that for example we use calcNormFactors() method in edgeR to eliminate it. I meant I expected that we use calcNormFactors() method to eliminate the sequencing depth varieties. But now I do not understand what does this "RNA composition" bias mean.
Ah, you're referring to section 2.5.3 in the user guide, then. I'll create an answer below.
yes exactly I was mentioning that part.