Question

Best normalization method to use with ChIP-seq counts

1

Entering edit mode

5.1 years ago

science_lizard ▴ 30

Statistics (Opinion?) Question:

If you use FeatureCounts to calculate reads over some BED file (e.g., TSS coordinates) for ChIP-seq BAM files, is it best to normalize the BAM files to their inputs first, or is it better to allow downstream programs to perform their normalization on the raw counts? And for raw counts, what do you think is the best normalization to use: R/CPM, R/FPKM, TMM, TPM, or other (not so much for differential analysis, more like if you wanted to plot the normalized reads in a graph)? Again, specifically considering ChIP-seq and not RNA-seq in this case. I'm not a huge fan of downsampling so I wouldn't typically include that in the pipeline, but maybe you feel otherwise for normalization purposes?

Thanks! Looking forward to hearing your position.

chip-seq normalization counts featurecounts edger • 4.9k views

ADD COMMENT • link updated 5.1 years ago by ATpoint 89k • written 5.1 years ago by science_lizard ▴ 30

score 3 · Answer 1 · 2020-08-28

3

Entering edit mode

5.1 years ago

ATpoint 89k

Don't downsample. It is effectively the same as per-million scaling and not robust to changes in signal-to-noise ratio which almost certainly is different between ChIP-seq samples. See my answer were, the same applies for ChIP-seq:

A: ATAC-seq sample normalization (quantil normalization)

I'd (as outlined in that post) use TMM from edgeR on your raw count matrix and then as a diagnostic use MA-plots to see whether this manages to properly scales the samples to each other. The majority of data points should be at more or less zero on the y-axis. The post also includes how to scale a bigwig file for browser tracks with these scaling factors from edgeR. Be sure to spend time really checking whether samples are normalized well, I feel like people often avoid that and then are surprised that results do not meet expectations. Normalization is critical.

ADD COMMENT • link 5.1 years ago by ATpoint 89k

0

Entering edit mode

Good point! But for something like ChIP-seq where you typically sequence IgG or gDNA inputs, do you think it's important to first normalize the BAM files you use prior to FeatureCounts to their inputs?

ADD REPLY • link 5.1 years ago by science_lizard ▴ 30

0

Entering edit mode

I personally do not do that as there is (to my knowledge) no widely accepted and robust method available that 1) normalizes each sample to its input and 2) properly corrects for the issues outlined in the linked post above. This is not satisfying, I know, so if you ever find a robust tool please share it. I only use the inputs during peak calling.

ADD REPLY • link 5.1 years ago by ATpoint 89k

0

Entering edit mode

Well there are two ways I can think of that you could do it, you could use something like Deeptools to normalize your ChIP BAMs to their inputs first, then move on to downstream processing with FeatureCounts etc. and still do TMM normalization later. I guess the benefit of this is at least somehow addressing the issue of ChIP efficiency. Then alternatively, I've seen this in some papers (though I think it's a little strange), they do exactly what you were talking about above with something like a TMM normalization but on all their files, including inputs, then they either subtract or divide the ChIP TMM normalized counts by the TMM normalized inputs. Seems a little weird but it seems to be 'publishable' like that...?

ADD REPLY • link 5.1 years ago by science_lizard ▴ 30