Hey everybody!
I am performing ChIP-seq data analysis and I am currently in the process of generating .bigWig files using bamCoverage
from the deepTools suite.
I have data for both narrow histone marks, such as H3K4me3, but also broader marks as H3K36me3.
I am aware of the fact that probably I should choose appropriately an integer value for --binSize
and --smoothLength
parameters, but I am not exactly sure what values are the most appropriate for both cases.
Any opinion would be very appreciated!
Thanks :)
I disagree here as the binsize in
bamCoverage
defines how many adjacent bases will be averaged into a single value when making the bigwig file. The default is 10 bases (or 50, I would need to check?), I personally always use 1bp as (to my eye) larger bins look ugly on a genome browser and you lose the per-bp precision which can be important depending on your application. Certainly do not use anything big, for sure not 1kb, that would interfere with the visualization as the signal over an entire 1kb interval would be compressed into a single value.Thanks for the reply! So you would suggest also not set up any value for smoothLength? Also, when it comes to bamCompare instead, in case I want to normalize the ChIP sample over its input, would you still recommend to select binSize 1bp and no smoothLength?
I personally never used smoothing here. I usually use the bigwigs then in R to do whatever plot I want and then do smoothing (if necessary) during that process, but I keep the original bigwig file with the actual raw data. I also never do input normalization as there is (to my taste) no reliable method for this out there, I only use the input to call peaks against.
I agree, I also mostly use input normalization only during the peak calling process, while for genome browser visualization I prefer to avoid it (or I would not be able to get an idea of my background signal). I am currently proceeding with the default binSize of 50bp and see how it looks like. Thanks again for your opinion, very appreciated :)
I don't think that 1 base precision is necessary for histone marks. Of course, lower bins are more useful for TFs as I said. Use of such a low bin in broad signals will make graph look oversegmented. Certainty, the selection of binSize is based on the nature of the experiment and 1bp for histone mark is not ideal according to my opinion. Visualization of 1kb resolution should work well. In any case, you can try as many times as you want with different parameters and see what fits better to your needs.
I see no reason to use bigwig for peak calling and not directly the alignment file. I presume you want the bigwig visualization for aesthetic purposes (article figure etc).
Yes indeed, I want to produce .bigWig files simply for inspecting signal enrichment in the genome browser and take snapshots or interesting regions, I am not producing such files as a preliminary step to peak calling.
Thence, I see no reason why to spend time and storage space to do very high resolution. If you want to show large regions of 1M etc then go for higher resolution as there will be no actual difference. If you want to show small areas of gene size then invest on time and space to produce higher resolution. There is no standard way. You always have to adjust according to your needs.
I strongly encourage you to not use a 1kb bin size for something like H3K4me3, your resolution there will be terrible. For most cases I wouldn't go over 50.
Thanks for your answer! I have already filtered out of my .bam files all the unmapping reads, duplicates and kept only the best alignment for the multimappers. The default parameters use a bin size of only 50bp, so maybe indeed I should go for larger bins as suggested by you.
I guess it would be a lot to play with regarding binSize and smoothLength until I get a satisfactory track visualization on the genome browser.