Hi,
Trying to find out where I can get the most recent file of mm10 blacklisted regions? Thank you.
Rob.
Hi,
Trying to find out where I can get the most recent file of mm10 blacklisted regions? Thank you.
Rob.
Select mm10 in the drop-down box
The ATAC-seq authors recently created a mitochondrial blacklist (found here) for use on ATAC-seq data which represents high signal regions on the nuclear genome caused by read sequence homology with the mitochondrial genome. A signal artifact blackist has also been created by ENCODE (found here).
In the command line:
for i in *.bedfile; do bedtools intersect -v -a $i -b [PATH]/mitochondrial.blacklist.bed [PATH]/signal.artifact.blacklist.bed > $i.bed; done
Be careful to not create an infinite loop with this command (all the files may end in .bed)
Probably you are looking for this
https://sites.google.com/site/anshulkundaje/projects/blacklists
Current NGS blacklists based on the official paper:
Official Nature Paper (Amamiya, Kundaje, Boyle et al 2019)
https://www.nature.com/articles/s41598-019-45839-z
Download from GitHub (link from the paper)
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
What is the definition of blacklisted regions?
"regions in the human genome that have anomalous, unstructured, high signal/read counts in next gen sequencing experiments independent of cell line and type of experiment."
I have this info for mm9 which I guess I can liftover but was wondering if there were any updated BED files.
Thanks
Just curious: where you got this info from (url)? Lift over is a good idea indeed (and may be the only possibility!)
Lifting over blacklisted regions generally doesn't work, since it's typically the case that those regions have been resolved in subsequent releases.
Hi Devon, as I understand, the blacklisted regions refers to NextGen experiments, not to the Genome assembly per se. A new genome assembly might resolve the regions not assembled earlier. But these regions are not always the same regions which will have anomalous read counts in Next-Gen experiments. These two seem to be different things to me.
They're often one and the same. The regions tend to overlap assembly issues. Yes, this won't always be the case, but this is much of the reason for the difference in blacklisted regions between GRCh38 and GRCh37.
I'm getting more confused, sorry :) If the regions are not in the genome assembly in the very first place, how can NGS reads map there? For me, it seems like the blacklisted regions are mostly repeats, which are present in the assembly, but it is difficult to map NGS reads on them. In this sense, they are "resolved" in assembly, but difficult to map NGS reads on them.
Often the copy number in the assembly has been fixed. For repeats, this makes the resulting alignments actually repetitive in the newer assembly so there's no reason to blacklist since you no longer get aberrant peaks.