I have just mapped my ATAC-seq fragments, resulting from paired-end sequencing using this command:
./samtools view file | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[ \t]*//' > frag.distribution
I see as expected, a large maximal peak at around 60bp, with most reads under 100bp, and then a second lower hump at around 160-200bp corresponding to the mononucleosome.
However, I also see spikes at 151bp and 139bp - consistent over replicates . You can see similar 'spikes' in the fragment distribution of S. pombe in this paper: "Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions" Figure 3a.
Any idea what these spikes correspond to? Is this related to the fact that Tn5 is a dimer, that inserts two adapters 9bp distance apart?
Please post the pdf that this command produces (Picard tools from the Broad Institute), or alternatively the plot you made:
I mapped to a reference genome - it is a species for which chrM is not known. I did trim the Nextera adapter.
I cannot see anything suspicious on that distribution. These little spikes every 10bp correspond to the helical pitch of the DNA. I cannot explain why these two larger spikes are present, but I do not think that you should worry.
Okay, thanks. If I had Nextera adapter, the peak would be way more shifted to the right? Those two peaks before the mononucleosome are present in the paper "Structured nucleosome fingerprints enable high-resolution mapping of chromatin architecture within regulatory regions" Figure 3a. I was just wondering....
The spikes at 140bp and 150bp correspond to the size of the core nucleosome particle in conjunction with the 10bp periodicity of the DNA helix mentioned by @ATpoint. These sizes were first reported by biochemists Axel, van Holde, Kornberg, and others in the 1970s, using micrococcal nuclease digestion.