Hi,
I'm working on a RNA-Seq dataset and would like to use the TMM normalization from the edgeR package to normalize the data. I have read the manual and also the paper here.
I have two questions regarding the TMM normalization.
first, in our data, we are mostly interested in specific regions on the chromosomes. For that reason we extracted these regions from the complete mapped bam files using samtools. Does it make a different for the TMM normalization if I am taking only the extracted specific regions into account when normalizing the data rather than taking the whole library.
I know that the values I'm getting at the end will differ due to the fact that I have different numbers of reads mapped to the region of interest. BUT all in all, can I use the TMM normalization only on the extracted subset of the data?
Second, Can someone please try to explain to me the main difference between the scaling method of normalization and the normalization by library size?
I don't think I really got it from the paper.
Thanks
Tomas
Hi Devon and thanks for the fast response. Do I understand it correctly if I say ( and I sort of quote the paper here) that the TMM normalisation computes the proportion of each gene's reads relative to the total number of reads in the library and compare that across all samples?
and what about the other way around? What if a large number of genes, which suppose to be differentially expressed are not in this subset of interest? Will it than skew the results in an unwanted way?