Question

Bowtie not aligning shorter fragments from ATAC-seq

0

Entering edit mode

4.5 years ago

eyleen.corrales ▴ 30

Hi! I'm trying to do the alignment for an ATAC-seq experiment using bowtie2. We did paired-end 100 pb sequencing. By checking with FastQC, of course I have adapter contamination. So I started trimming the reads as follows:

TrimmomaticPE $(DATAPATH)$(FILE1) $(DATAPATH)$(FILE2) $(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output1_unpaired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq $(SCRATCH_JOBDIR)/$(NAME)_output2_unpaired_trimmed.fastq ILLUMINACLIP:/home/ecorrales/Programs/Trimmomatic-0.36/adapters/NexteraPE-PE.fa:2:30:10:8:TRUE TRAILING:10 MINLEN:25

Afterwards I don't see any longer adapter contamination, and only few (0.11%) of the reads are removed. Then I proceeded to do the mapping with bowtie2:

bowtie2 -X 2000 --very-sensitive -x $(GENOMEDIR) -1 $(SCRATCH_JOBDIR)/$(NAME)_output1_paired_trimmed.fastq -2 $(SCRATCH_JOBDIR)/$(NAME)_output2_paired_trimmed.fastq -S $(SCRATCH_JOBDIR)/$(NAME)_output.sam

I then checked the insert size distribution and there were no inserts smaller than 100 bp. I assumed that this could be related to those fragments where the insert size was smaller than the read length. So I tried trimming the reads to different lengths before mapping. Indeed all the inserts smaller than the length that I trimmed to were not mapped.

In the plot below I show the insert size distribution after trimming the reads to 50 bp or 75 bp (or 100 bp for no trimming):

Insert size distribution

I thought that adding the --dovetail parameter to the mapping would do the trick, but nothing changed. Is there any other parameter that I could be missing?

alignment • 2.4k views

ADD COMMENT • link 4.5 years ago by eyleen.corrales ▴ 30

0

Entering edit mode

How are you calculating fragment size? You also may need to add --soft-clipped-unmapped-tlen in addition to --dovetail.

ADD REPLY • link 4.5 years ago by igor 13k

0

Entering edit mode

Thank you for your reply. I tried with --soft-clipped-unmapped-tlen but doesn't seem to change. For calculating the fragment size I used

samtools view AS-435485-LR-48138_output.sort.bam | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[   ]*//' > AS-435485-LR-48138_insertSize.txt

ADD REPLY • link 4.5 years ago by eyleen.corrales ▴ 30

0

Entering edit mode

Have you checked what happens to the reads with small insert sizes when you don't trim?

ADD REPLY • link 4.5 years ago by igor 13k

0

Entering edit mode

I guess the problem might be indeed be in the insert size calculation rather than on the mapping. I initially thought that the small inserts were somehow considered discordant, although I get the following metrics:

  49753 (100.00%) were paired; of these:
1384 (2.78%) aligned concordantly 0 times
24418 (49.08%) aligned concordantly exactly 1 time
23951 (48.14%) aligned concordantly >1 times
----
1384 pairs aligned concordantly 0 times; of these:
  53 (3.83%) aligned discordantly 1 time
----
1331 pairs aligned 0 times concordantly or discordantly; of these:
  2662 mates make up the pairs; of these:
    1708 (64.16%) aligned 0 times
    462 (17.36%) aligned exactly 1 time
    492 (18.48%) aligned >1 times

And the insert distribution looks like this (below is log transformed):

Insert-size

However, I accidentally re-run the mapping after cropping the first three bases of the reads using HEADCROP:3 parameter from Trimmomatic. Afterwards the insert size distribution looked much better (still I see a gap at 100 bp):

Insert-size_Headcrop:3

The mapping efficiency looks very similar, so this is why I guess the mapping was not the problem:

  49940 (100.00%) were paired; of these:
1365 (2.73%) aligned concordantly 0 times
24373 (48.80%) aligned concordantly exactly 1 time
24202 (48.46%) aligned concordantly >1 times
----
1365 pairs aligned concordantly 0 times; of these:
  49 (3.59%) aligned discordantly 1 time
----
1316 pairs aligned 0 times concordantly or discordantly; of these:
  2632 mates make up the pairs; of these:
    1703 (64.70%) aligned 0 times
    450 (17.10%) aligned exactly 1 time
    479 (18.20%) aligned >1 times

Still I'm a bit confused why this happens or what I'm doing wrong. I don't think is a matter of the quality of the first bases since it looks good, and the mapping efficiency looks similar regardless of removing or not these three bases.

ADD REPLY • link 4.5 years ago by eyleen.corrales ▴ 30

score 0 · Answer 1 · 2020-06-18

0

Entering edit mode

4.5 years ago

eyleen.corrales ▴ 30

Just as a follow up to this question (in case that someone ever has the same issue), I realized that the problem was the version of Bowtie2 I was using, which calculates a negative TLEN when paired end reads exactly coincide. This bug was fixed in the latest versions of Bowtie.

ADD COMMENT • link 4.5 years ago by eyleen.corrales ▴ 30

0

Entering edit mode

How old was the version you've been using? I've been getting the correct TLEN with Bowtie2 for at least 2 years.

ADD REPLY • link 4.5 years ago by igor 13k

0

Entering edit mode

A bit old; we had version 2.3.4.3 (September 17, 2018). This issue was fixed on the following release (version 2.3.5 - March 16, 2019):

Fixed issue whereby both ends of a paired-end read could have negative TLEN if they exactly coincided

I never thought this could be the issue

ADD REPLY • link 4.4 years ago by eyleen.corrales ▴ 30

0

Entering edit mode

Odd. I've used 2.3.3 before and I was definitely getting proper TLENs for short fragments.

ADD REPLY • link 4.4 years ago by igor 13k