Hello!
I am new to ChIP-seq analysis and I am proceeding with bamCompare to generate a .bw file from two .bam files (one ChIPed and one input). In my ChIP experiment I am investigating enrichment of a very broad histone mark, which can cover up to megabases along the genome.
To do so, I am using the following command line:
bamCompare -b1 aln_results/aln_sorted/WT_1.1_NT.srt.bam -b2 aln_results/aln_sorted/WT_1.1_NT.input.srt.bam -o IGV/WT_1.1_NT/WT_1.1_NT.scale.bw -p 3 -bl hg19/blacklist/blacklist_hg19.merged.bed --effectiveGenomeSize 2864785220 -e --ignoreDuplicates -bs 20000 --smoothLength 200000
When I visualize my output in the IGV genome browser, I notice that almost all the genomic regions show negative values, which makes me wonder a bit. I am aware of the fact that, by default, the --operation parameter is set to log2 and, therefore, negative values are expected. However, for visualization purposes, I tried to set "--operation mean" and now I got positive values in the output and in the genome browser track.
Consequently, I am wondering how appropriate is to set --operation to "mean" or any other option, rather than computing log2 ratio between ChIP and input samples.
Moreover, would it be appropriate to also set a normalization method, such as BPM, or would you suggest to use scaling factors instead?
Thanks!
Thanks for your reply! Indeed, I figured out that the mean option is not at all what I need and that it is only suitable for merging replicates, as you also mentioned. I have tried different normalization and scaling options within the bamCompare function, but apparently none of them seems to satisfy my intentions. When I set the --operation parameter to "ratio" I get now only positive values but the track in the genome browser is so messy and "compact", even selecting a small bin size (I am not sure if I have been clear enough). Using ratio I can barely spot differences in enrichment regions, which are instead way clearer using the default log2ratio.
How would you suggest me to proceed at this time? Looks pretty clear that my input samples have a deeper sequencing depth than my ChIPed ones. This makes me kind of worried.
Would it maybe be better in my case to proceed with bamCoverage instead and normalize every single sample rather than keep trying with a ChIP/input normalization?
Thanks again!
You could downsample your input, i.e. randomly remove reads until it's the same depth of the ChIP. Try the plotFingerprint command of deepTools to graph the distribution of reads and see how the ChIP libraries compare to the input. Have you tried using a peak calling tool designed for broad histone marks, such as SICER?
The plot obtained from plotFingerprint is not good, indeed the two curves (one for the ChIP and one for the input) are so close to each other. I am performing peak calling with MACS2 and selecting the --broad option since it works fine with broad peaks as well. Indeed, it still outputs me around 500 enriched peaks with q<0.1.
You said this particular histone mark can cover up to megabases of the genome, so I would put more importance on the percentage of the genome covered by your peak calls, rather than just the number of peaks. All that Macs2 does to call broad peaks is to first call narrow peaks, then to combine nearby peaks into broader regions. So if your mark doesn't have enough enrichment to get many narrow peaks calls, the broad peak calling won't be very effective either. There are tools that are specifically designed to work with broad marks which may give you better results.
Sure, I will give a try to SICER then! Thanks again!