Hello,
I'm new to ATAC-seq, and I'm seeing something that seems odd to me in my sequencing data.
I'm using Bowtie2 to align my reads without trimming to the genome (Human), and I'm getting between 40-50% aligning 0 times. If I write the unaligned reads to their own fastq, it appears that these are all Illumina Transposase Adaptor sequences. It's not immediately apparent to me why such a high percentage would be the sequenced adaptor sequence. Additionally, if I use Cutadapt to trim the adaptor sequence, I think have a large % that goes to 0-1 bp in length (which makes sense given the above).
Any input on this would be much appreciated.
As a side question, is it abnormal to have a high % of multimappers (~30%)?
You have adapters dimer with no inserts. Not much you can do at this step but remove/ignore those and move on.
For ChIP-seq, there are certain regions with high proportion of multi-mappers, and there is even blacklists for these regions (I think for human and mouse). I think ATAC-seq also suffers from this problem.
There are indeed low-complexity regions that attract multimappers, but this is only a small fraction of an NGS library. We did quite many ATAC-seq experiments in our lab, but I've never seen 30% multimappers, rather < 5%.
ENCODE blacklists for ATAC-seq.
what genomax and h.mon said plus: the most likely reason is that the library prep wasn't optimal
Thanks for the feedback. Do you have any suggestions on what parameters I might want to change (Following the Buenrostro protocol)? Would the amount of cells be the main factor here?
amount of DNA is certainly a main contributor. I'm not a wet lab person so I don't know the specifics of how to get rid of adapters, I'd imagine there might be some way to clean the samples before you put them on the flow cell.
No, the cell number is not a factor. Given a proper handling, ATAC-seq works fine from 500 to several tens of thousands of cells. Did you see an abnormally high dimer peak in the gel/bioanalyzer at around 100bp?
I did see a relatively large peak there. Do you have protocol suggestions to reduce this dimer amount or would you suggest size selecting the library?
I always purify after the PCR with 1.2x AMPure XP beads. That reduces the peak dramatically. Still, the standard protocol uses a standard column purification without caring about this peak. Can you upload the gel/BA picture somewhere so I can have a look? Typically you can judge library quality quiet well from the gel picture.
https://postimg.org/image/44tny8apj/
Pretty messy at the beginning of the plot, I've never seen these kind of peaks in my libraries. The nucleosomal pattern looks ok though. You really seem to have an issue with adapter dimers. Did you use the standard PCR protocol without modifications, and assembled the reaction on ice? Which polymerase did you use?
I used the PCR protocol without any modifications (and on ice). Per the protocol I used the NEBNext High-Fidelity 2x PCR Master Mix. For the qPCR step to determine cycles I used a slightly modified protocol as my lab buys a mix rather than SYBR Green but I don't see that being a huge difference.
Then I do not know what happened. As a further recommendation, you should additionally add Tween-20 at 0.1% to both transposition and lysis buffer. It will greatly reduce the mitochondrial DNA percentage in your library. Reference for this would be Litzenburger 2017 I think.