Question

bamCompare output values ChIP/input normalization

1

Entering edit mode

5.1 years ago

Marco Pannone ▴ 810

Hello!

I am new to ChIP-seq analysis and I am proceeding with bamCompare to generate a .bw file from two .bam files (one ChIPed and one input). In my ChIP experiment I am investigating enrichment of a very broad histone mark, which can cover up to megabases along the genome.

To do so, I am using the following command line:

bamCompare -b1 aln_results/aln_sorted/WT_1.1_NT.srt.bam -b2 aln_results/aln_sorted/WT_1.1_NT.input.srt.bam -o IGV/WT_1.1_NT/WT_1.1_NT.scale.bw -p 3 -bl hg19/blacklist/blacklist_hg19.merged.bed --effectiveGenomeSize 2864785220 -e --ignoreDuplicates -bs 20000 --smoothLength 200000

When I visualize my output in the IGV genome browser, I notice that almost all the genomic regions show negative values, which makes me wonder a bit. I am aware of the fact that, by default, the --operation parameter is set to log2 and, therefore, negative values are expected. However, for visualization purposes, I tried to set "--operation mean" and now I got positive values in the output and in the genome browser track.

Consequently, I am wondering how appropriate is to set --operation to "mean" or any other option, rather than computing log2 ratio between ChIP and input samples.

Moreover, would it be appropriate to also set a normalization method, such as BPM, or would you suggest to use scaling factors instead?

Thanks!

ChIP-Seq deeptools bamcompare normalization • 4.8k views

ADD COMMENT • link updated 5.1 years ago by colin.kern ★ 1.1k • written 5.1 years ago by Marco Pannone ▴ 810

score 2 · Answer 1 · 2020-03-18

2

Entering edit mode

5.1 years ago

colin.kern ★ 1.1k

If the vast majority of your log2 ratios are negative, that means most of the genome has higher depth of the input compared to the ChIP library. Generally the SES scaling method is recommended when normalizing an input and ChIP, but I don't have experience with using it for very broad marks, so I'm not sure if it's a different situation. The mean operation is not at all what you want. That just calculates the mean of the ChIP depth and input depth in the bin, which doesn't really tell you anything. It's more appropriate for creating a track where you're, e.g., combining libraries from different replicates. I would experiment with different normalization and scaling options and compare the results with what the data looks like on a genome browser to get a feel for which might be the best method for your data.

ADD COMMENT • link 5.1 years ago by colin.kern ★ 1.1k

0

Entering edit mode

Thanks for your reply! Indeed, I figured out that the mean option is not at all what I need and that it is only suitable for merging replicates, as you also mentioned. I have tried different normalization and scaling options within the bamCompare function, but apparently none of them seems to satisfy my intentions. When I set the --operation parameter to "ratio" I get now only positive values but the track in the genome browser is so messy and "compact", even selecting a small bin size (I am not sure if I have been clear enough). Using ratio I can barely spot differences in enrichment regions, which are instead way clearer using the default log2ratio.

How would you suggest me to proceed at this time? Looks pretty clear that my input samples have a deeper sequencing depth than my ChIPed ones. This makes me kind of worried.

Would it maybe be better in my case to proceed with bamCoverage instead and normalize every single sample rather than keep trying with a ChIP/input normalization?

Thanks again!

ADD REPLY • link 5.1 years ago by Marco Pannone ▴ 810

0

Entering edit mode

You could downsample your input, i.e. randomly remove reads until it's the same depth of the ChIP. Try the plotFingerprint command of deepTools to graph the distribution of reads and see how the ChIP libraries compare to the input. Have you tried using a peak calling tool designed for broad histone marks, such as SICER?

ADD REPLY • link 5.1 years ago by colin.kern ★ 1.1k

0

Entering edit mode

The plot obtained from plotFingerprint is not good, indeed the two curves (one for the ChIP and one for the input) are so close to each other. I am performing peak calling with MACS2 and selecting the --broad option since it works fine with broad peaks as well. Indeed, it still outputs me around 500 enriched peaks with q<0.1.

ADD REPLY • link 5.1 years ago by Marco Pannone ▴ 810

1

Entering edit mode

You said this particular histone mark can cover up to megabases of the genome, so I would put more importance on the percentage of the genome covered by your peak calls, rather than just the number of peaks. All that Macs2 does to call broad peaks is to first call narrow peaks, then to combine nearby peaks into broader regions. So if your mark doesn't have enough enrichment to get many narrow peaks calls, the broad peak calling won't be very effective either. There are tools that are specifically designed to work with broad marks which may give you better results.