Entering edit mode
9.2 years ago
TiPi
•
0
Hi,
I was wondering what would be the best solution to compare two samples with drastically uneven amount of reads in RNA-seq, lets say 5 M for library A vs 25 for library B, and couldn't find sufficient information about this. Should one rather normalize according to library size on the level of mapped reads / counts or do a subsampling of library B prior to mapping? What would be pros and cons of it? I am leaning towards the first option as I feel that subsampling can create a bias but I am unsure if DGE tools like DEseq or HTseq can "handle" the differences in library size appropriately.
Thanks
Subsampling would not be a good idea. The library size normalisation should be fine.
In my experience, the typical normalization methods break down at around 10x difference in read number between the lowest and median library. You can typically see this in some of the diagnostic plots, which will start looking really strange.