Question

How To Extract Reads-Pairs Aligned Concordantly Exactly 1 Time from Bowtie2 (version 2.3.4.1 or high)?

5

Entering edit mode

2.2 years ago

Chilly ★ 1.3k

I tried to extract reads-pairs aligned concordantly exactly 1 time from Bowtie2 (version 2.3.4.1 or high). This was fixed in Bowtie2's version a few years ago (the previous solution and another solution), but now the problem has resurfaced.

When I run:

bowtie2 -x sample.index -p 50 -t -1 sample_R1.trimmed.fastq -2 sample_R2.trimmed.fastq -S example.sam -k2 --no-discordant --no-mixed

The results would look like this:

Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:35
20826672 reads; of these:
  20826672 (100.00%) were paired; of these:
    15275620 (73.35%) aligned concordantly 0 times
    5048169 (24.24%) aligned concordantly exactly 1 time
    502883 (2.41%) aligned concordantly >1 times
26.65% overall alignment rate
Time searching: 00:00:35
Overall time: 00:00:35

What I want to separate and extract is the '5048169 (24.24%) aligned concordantly exactly 1 time' reads-pairs.

But when I followed the previous workarounds, I found:

1)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | wc -l

or

samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | wc -l

and result(s): 10196512, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

2)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | foo.py | wc -l

or

samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | foo.py | wc -l

and result(s): 10193259, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

foo.py is (from Mr. Devon Ryan's solution):

#!/usr/bin/env python
import csv
import sys

f = csv.reader(sys.stdin, dialect="excel-tab")
of = csv.writer(sys.stdout, dialect="excel-tab")
last_read = None
for line in f :
    #take care of the header
    if(line[0][0] == "@") :
        of.writerow(line)
        continue

    if(last_read == None) : 
        last_read = line
    else :
        if(last_read[0] == line[0]) :
            of.writerow(last_read)
            of.writerow(line)
            last_read = None
        else :
            last_read = line

3)

samtools view example.sam | grep YT:Z:CP | uniq -u | wc -l

or

samtools view -bS example.sam | grep YT:Z:CP | uniq -u | wc -l

and result(s): 12107870, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

4)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | uniq -u | wc -l

and result(s): 10159819, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

I've tried almost every existing solution but to no avail. Urgently ask for your help.

RNA-seq pair-end concordantly bowtie2 alignment • 678 views

ADD COMMENT • link 2.2 years ago by Chilly ★ 1.3k

score 6 · Accepted Answer · 2022-08-30

6

Entering edit mode

2.2 years ago

Chilly ★ 1.3k

This might help. https://github.com/BenLangmead/bowtie2/issues/406

ADD COMMENT • link 2.2 years ago by Chilly ★ 1.3k