Buenrostro et al 2015 mentioned a custom blacklist for mitochondrial homologs for hg19 and mm10. Does anyone know if this list is publically available somewhere?
I don't know if it is available, you could try contacting the authors or ask on their ATAC-seq forum. You could also try creating your own mitochondrial blacklist (this is what I did when I was analysing my ATAC-seq data). Use wgsim to simulate reads from the mitochondrial chromosome , align them to your reference genome, call peaks using MACS and then use the called peak regions as your blacklist. For consistency I generated reads which were the same read length as the data I was analysing.
I don't think the user has confused anything. The authors of ATAC-seq created a mitochondrial blacklist which represents high signal regions on the nuclear genome caused by read sequence homology with the mitochondrial genome. The blacklist files have now been uploaded by the authors, so you can get it directly from them now, instead of generating your own list.
real NUMTs, assuming transposase binds to them, would map to the NUMT regions you list above. I don't know whether they would also align to chrM. Better to blacklist both chrM and the regions.
Using the following command I have now removed the contents of both blacklists from my bed files (of course I had already removed chrM using the command I originally posted above):
for i in *.bedfile; do bedtools intersect -v -a $i -b [PATH]/mitochondrial.blacklist.bed [PATH]/signal.artifact.blacklist.bed > $i.bed; done
Be careful to not create an infinite loop with this command (all the files may end in .bed)
I don't think the user has confused anything. The authors of ATAC-seq created a mitochondrial blacklist which represents high signal regions on the nuclear genome caused by read sequence homology with the mitochondrial genome. The blacklist files have now been uploaded by the authors, so you can get it directly from them now, instead of generating your own list.
wow.
Im so glad I posted that answer. I'm literally about to start removing the blacklist.... and I didn' know about that link. Thank you James.
Is there a link without going through google forums? I need permission for access.
These homologous regions are called NUMTs
So Jeremy/James ... Are NUMTs recognised by bowtie2 and annotated with chrM?
The code above removes only reads from a BAM file that the aligner (bowtie2 in my case) annotates with 'chrM'
Can I trust that all NUMTs are gone?
real NUMTs, assuming transposase binds to them, would map to the NUMT regions you list above. I don't know whether they would also align to chrM. Better to blacklist both chrM and the regions.
Thank you Jeremy.
ATAC_seq author mitochondrial blacklist
ENCODE signal artifact blacklist
Using the following command I have now removed the contents of both blacklists from my bed files (of course I had already removed chrM using the command I originally posted above):
Be careful to not create an infinite loop with this command (all the files may end in
.bed
)