I am preprocessing some bam files for differential binding analysis. I have read that some people recommend removing blacklist regions (human) before to do the peak calling. Do you have any suggestions how directly extract the read counts in these regions using maybe samtools?
Keep in mind - when you perform a cross strand correlation analysis to assess peak quality in your data, if you do not exclude blacklisted reads prior to this analysis (which generally uses aligned reads or bed converted intervals from these reads), then you will include these regions in your analysis which may skew the results one way or the other.
Devon is right - the time savings will be nice to call peaks without this step and peak calling shouldn't really be affected much by these regions, depending on which peak caller you decide to use.
Just have a look at a few of the blacklisted regions after your alignment to see what kind of signal you're getting there, and then choose whether to use filtered alignment files or not for whichever downstream analyses (following Sukhdeep Singh's suggestion).
Good point regarding cross correlation with blacklisted regions included. Someone in our group mention maybe adding a cross correlation tools to deepTools. That'd properly ignore blacklisted regions (though we'd have to think about edge effects). Maybe I'll write something like that.
Normally you don't need to modify or do anything with the alignments. After calling peaks, exclude regions that overlap a blacklisted region (use bedtools).
Relatedly, deepTools has an option to ignore reads/signal in blacklisted regions as of version 2.2 I think. This won't do what you need in this particular case, but for other ChIP-seq related things this is useful.
Thanks Devon. Do you mean to use deeptools before calling peaks to get a kind of marked BAM file for these regions?
So you donĀ“t recommend to remove blacklisted regions in the alignments, but the reads/ high-signal associated with blacklisted regions can affect the peak calling with MACS2.
I mean what I wrote. Don't bother excluding reads before peak calling (yes, you can do this with bamutils or bedtools if you want to wait a while), just exclude the peaks afterward. You might get poorer power in the direct vicinity of blacklisted regions, but that'll be a minor effect given the time savings.
Devon,
you are not answering the question:
can " the reads/ high-signal associated with blacklisted regions can affect the peak calling with MACS2?
Also
I tried removing the blacklisted regions using bedops so I had do bam to bed file conversion, after filtering am unable to convert the files back to bam
am using bedtools to do the file conversions.
can anyone please suggest a way to filter directly from the bam files, bedops does not take bam files and the blacklisted regions are in bed format
Blacklisted regions don't normally do much except cause false-positive peaks (inside the blacklisted regions). This isn't always the case, but if you have nice large peaks versus input then it will be. If you have really weak enrichment (for whatever reason) then you might gain something from stripping out reads in blacklisted regions (maybe I should write a little tool for that so it can be multithreaded).
Use intersectBed with
-v
paramhttp://bedtools.readthedocs.org/en/latest/content/tools/intersect.html