Statistics (Opinion?) Question:
If you use FeatureCounts to calculate reads over some BED file (e.g., TSS coordinates) for ChIP-seq BAM files, is it best to normalize the BAM files to their inputs first, or is it better to allow downstream programs to perform their normalization on the raw counts? And for raw counts, what do you think is the best normalization to use: R/CPM, R/FPKM, TMM, TPM, or other (not so much for differential analysis, more like if you wanted to plot the normalized reads in a graph)? Again, specifically considering ChIP-seq and not RNA-seq in this case. I'm not a huge fan of downsampling so I wouldn't typically include that in the pipeline, but maybe you feel otherwise for normalization purposes?
Thanks! Looking forward to hearing your position.
Good point! But for something like ChIP-seq where you typically sequence IgG or gDNA inputs, do you think it's important to first normalize the BAM files you use prior to FeatureCounts to their inputs?
I personally do not do that as there is (to my knowledge) no widely accepted and robust method available that 1) normalizes each sample to its input and 2) properly corrects for the issues outlined in the linked post above. This is not satisfying, I know, so if you ever find a robust tool please share it. I only use the inputs during peak calling.
Well there are two ways I can think of that you could do it, you could use something like Deeptools to normalize your ChIP BAMs to their inputs first, then move on to downstream processing with FeatureCounts etc. and still do TMM normalization later. I guess the benefit of this is at least somehow addressing the issue of ChIP efficiency. Then alternatively, I've seen this in some papers (though I think it's a little strange), they do exactly what you were talking about above with something like a TMM normalization but on all their files, including inputs, then they either subtract or divide the ChIP TMM normalized counts by the TMM normalized inputs. Seems a little weird but it seems to be 'publishable' like that...?