While not truely bioinformatics-related, here are my thoughts, having done ATAC-seq in both primary and cell lines of human and mouse: It this duplication rate with out without reads aligned to chrM included? If included, this result is normal and expected as mitochondrial DNA (chrM is only 17kb, so very high coverage of a tiny genome using standard Illumina sequencing, and therefore many duplicates). gets tagmentated during the library prep as well. A simple yet powerful addition to the additional protocol is to add Tween-20 at 0.1% to both the lysis and tagmentation buffer, so lysis 10mM Nacl, 10mM Tris, 3mM MgCl2, 0.1% NP-40, 0.1% Tween
and tagmentation 25µl tagment buffer, 5µl 1% Tween, 2.5µl transposase to 50µl water
. Doing this, we typically reduce mtDNA percentage from like 50-80% in cell lines to about 15-20% without affecting library quality, even increasing signal-to-noise ration. A reference for this modifiction to the standard protocol is here.
Check the duplication rate in the files without chrM reads (hope you had chrM them in your reference genome index!). Possible code:
samtools idxstats in.bam | cut -f 1 | grep -v 'chrM' | xargs samtools view -o without_chrM.bam in.bam
% of Mitochondrial reads can be checked with:
function mtDNA {
mtReads=$(samtools idxstats $1 | grep 'chrM' | cut -f 3)
totalReads=$(samtools idxstats $1 | awk '{SUM += $3} END {print SUM}')
echo '[mtDNA Content]:' $(bc <<< "scale=2;100*$mtReads/$totalReads")'%'
}; export -f mtDNA
mtDNA in.bam
As for your proposed modifications, I do not recommend any of this. ATAC-seq, if done properly, is highly reliable and in our hands always perfectly fine given the cells are in good condition and viable without larger percentages of death cells. Should not be an issue for cell lines. Experimenting with cell numbers and things, unless you are in a organism with quiet different properties like flies, worms etc, is not necessary as the standard numbers have been extensively tested and validated. We did ATAC-seq in THP-1 cells a while back (unpublished) and both the standard protocol and the one I proposed above work perfectly fine. I recommend the one with Tween-20. We routinely do 11 PCR cycles for all libraries. Hope that helps.
Thank you for your suggestion.
I had forgotten to think about the number of mitocondoria per cell. I was so stupid! Actually, mtDNA rates were around 30-45% in my libraries. I will try to remove mtDNA from my fastq to see if how mtDNA affect in the PCR duplicate ratio.
Addition, thank you for your advice for new protocol. This is my first time to hear about this modification. Next time I will compare tween20 method and regular method.
Again, thank you very much.
The Tween-method is pretty much the standard by now (or at least it should be IMHO). There are even modifications of it, see "OmniATAC" for example.