Hello, I'm newbie in RNA-Seq analysis process.
When I processed a RNA-Seq analysis, there are some questions.
If you guys have a time, please let me know.
- In RNA-Seq data analysis (i.e DEG analysis), glmfit function in R used raw RNA-Seq count data for modelling?
I TMM normalized for correct non equivalance between samples, then I used glmfit function,
in this process, glmfit function used raw RNA-Seq count data in modelling? I knew that Count Per Millions normalization count data used in this function until now.
However, when I think about a assumption of glmfit, it's used non negative binomial based distribution.
Also, if we changed raw RNA-Seq count data to CPM normalization, then this data should follow continuous distribution.
So, I think this CPM normalized data should not use for modelling in glmfit function.
Is it true?
Thanks, for your answers.
Thanks for your answer!
Can I ask a one question to you?
Why do we use a CPM normalization for raw count data?
If you have a time, please leave the answer.
Thank you.
The only purpose of computing cpms is for plotting or for input into other programs. The edgeR differential expression pipeline does not use cpms at any stage. It only uses raw counts.
I understand it.
Thank you!
Gordon may correct me on this... If glmFit works with raw counts, one may wonder what is the point of the normalization step via TMM or other methods. The answer is that the normalized library size is used as an offset in the negative binomial model inside of glmFit. An offset is a component similar to a predictor variable that, in contrast to a predictor, does not need to be estimated because you are certain that its effect is 1. I.e. you are certain that the expected effect of doubling the number of reads sequenced (the raw library size) doubles the number of reads on each gene.
I posted this before seeing the OP's comment - CPM normalization is not used for DGE - maybe my comment here answer your question?
Thanks dariober, I understand the using of CPM.
But, I don't completely understand offset of glmfit.
If you have a time, please detailed explanation to me? or give me a some reference.
Thank you.
Just type help("glmFit"). It gives you the reference.