Entering edit mode
7.2 years ago
rbronste
▴
420
Wanted to get some opinions about masking repeats during differential peak calling for ATAC-seq or other open chromatin datasets, is this advised and if so what is a good strategy? Thanks.
I haven't seen a pipeline that does. ENCODE has a blacklist of regions which are typically excluded. I'd suggest looking over their pipeline:
https://github.com/kundajelab/atac_dnase_pipelines
I also have a more streamlined pipeline based on the ENCODE pipeline:
https://github.com/mforde84/ATACseq-analysis-pipeline
If you really want to exclude these regions, you could use dustmasker to identify intervals of interest / low complexity, then exclude associated tags corresponding to those genomic intervals.