Subsampling vs normalization
0
0
Entering edit mode
9.2 years ago
TiPi • 0

Hi,

I was wondering what would be the best solution to compare two samples with drastically uneven amount of reads in RNA-seq, lets say 5 M for library A vs 25 for library B, and couldn't find sufficient information about this. Should one rather normalize according to library size on the level of mapped reads / counts or do a subsampling of library B prior to mapping? What would be pros and cons of it? I am leaning towards the first option as I feel that subsampling can create a bias but I am unsure if DGE tools like DEseq or HTseq can "handle" the differences in library size appropriately.

Thanks

RNA-Seq • 3.4k views
ADD COMMENT
0
Entering edit mode

Subsampling would not be a good idea. The library size normalisation should be fine.

ADD REPLY
0
Entering edit mode
Subsampling I think Indeed is not the right option but you should not assume that normalization will solve everything. When you have a lot of variation in library size (<3M reads) and a few samples with very low library size you have to check the post normalization expression estimates of housekeeping genes, or better, spiked in controls if you have them. But I do think you will still be OK with 5 million reads in most scenario's.
ADD REPLY
0
Entering edit mode

In my experience, the typical normalization methods break down at around 10x difference in read number between the lowest and median library. You can typically see this in some of the diagnostic plots, which will start looking really strange.

ADD REPLY

Login before adding your answer.

Traffic: 2434 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6