Where can one download the blacklisted regions from ucsc encode data?
Where can one download the blacklisted regions from ucsc encode data?
For those who still land on this question, there is now an updated version (v2) of the blacklists available here: https://github.com/Boyle-Lab/Blacklist/tree/master/lists
These blacklists are described in The ENCODE Blacklist: Identification of Problematic Regions of the Genome (June 2019):
Here, we define the ENCODE blacklist- a comprehensive set of regions in the human, mouse, worm, and fly genomes that have anomalous, unstructured, or high signal in next-generation sequencing experiments independent of cell line or experiment.
just as an update for those in need of more recent blacklist regions (ce10, mm10, hg38), Anshul Kundaje supplies them here
From "stable" repositories in 2022:
The general ENCODE blacklists at https://github.com/Boyle-Lab/Blacklist/raw/master/lists/
The ATAC-seq blacklists from the Buenrostro lab (mitochondrial homologs in the genome) at https://github.com/buenrostrolab/mitoblacklist/tree/master/peaks
There's not yet a blacklist available for each species or even each version of mouse/human. You can get the mm9 blacklist here. The equivalent for hg19 is here. There's no equivalent for hg38 and I'm not sure that lifting things over will work, though you could certainly try. We have an mm10 version of this, but I don't know that it's publicly available.
If someone wants the SV excludelist, the 10x genomics has a topic about SV Calling Filter File
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
It was reported have overlaps. I looked into the list and found the overlap: chr16 34586660 34587100 chr16 34587060 34587660
Some one have a updated or fixed one?
Thanks,
Weiyan
I'm afraid I don't understand the issue. Can you elaborate?
When I use this file as a blacklist to run deeptools-bamcompare, it was reported this list has overlaps. chr16 34586660 34587100 chr16 34587060 34587660
My understand is the bottom range overlaps with the top one. 34587060 < 34587100
you could use
bedtools merge -i blacklist.bed
to merge those overlapping regions.Thank you very much! It works right now.
I downloaded the hg38 blacklist from Anshul Kundaje's repository as well and I found not only the overlap @weiyanjia2008 mentioned, but also chr20: 31067930 31069060 chr20: 31069000 31069280 . These two regions should also be merged. When I merged all of them (the ones from chr16 you identified and the ones I identified), it solved the problem for me. I just emailed Anshul Kundaje reporting this issue, in case he didn't notice so far, so that can be fixed and the next person that downloads it doesn't face the same problem.
I have been working on these things lately. Here is some more information.
Hi @Friederike and @venu Why are hg38 and hg19 list different? EDIT: The hg38 list seems to be smaller probably because many regions have been fixed in hg38 assembly. Does the hg38 blacklist also contain mitochondrial homologs? I believe, the homologs remain whether it be hg19 or hg38. Any insights on this. Thanks!
I have no deep insights into the specific differences between hg38 and hg19, but hg38 is generally considered to contain more "bait" sequences (meant to scavenge away reads that probably fell into some of the blacklisted regions before). I recommend to address Anshul Kundaje directly (and ideally share his response here).
Hi Can you comment on this issue:https://github.com/Boyle-Lab/Blacklist/issues/11
I really don't understand how the two files differ and which ones should be ultimately used.