How To Extract Reads-Pairs Aligned Concordantly Exactly 1 Time from Bowtie2 (version 2.3.4.1 or high)?
1
5
Entering edit mode
2.2 years ago
Chilly ★ 1.3k

I tried to extract reads-pairs aligned concordantly exactly 1 time from Bowtie2 (version 2.3.4.1 or high). This was fixed in Bowtie2's version a few years ago (the previous solution and another solution), but now the problem has resurfaced.

When I run:

bowtie2 -x sample.index -p 50 -t -1 sample_R1.trimmed.fastq -2 sample_R2.trimmed.fastq -S example.sam -k2 --no-discordant --no-mixed

The results would look like this:

Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:35
20826672 reads; of these:
  20826672 (100.00%) were paired; of these:
    15275620 (73.35%) aligned concordantly 0 times
    5048169 (24.24%) aligned concordantly exactly 1 time
    502883 (2.41%) aligned concordantly >1 times
26.65% overall alignment rate
Time searching: 00:00:35
Overall time: 00:00:35

What I want to separate and extract is the '5048169 (24.24%) aligned concordantly exactly 1 time' reads-pairs.

But when I followed the previous workarounds, I found:

1)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | wc -l

or

samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | wc -l

and result(s): 10196512, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

2)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | foo.py | wc -l

or

samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | foo.py | wc -l

and result(s): 10193259, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

foo.py is (from Mr. Devon Ryan's solution):

#!/usr/bin/env python
import csv
import sys

f = csv.reader(sys.stdin, dialect="excel-tab")
of = csv.writer(sys.stdout, dialect="excel-tab")
last_read = None
for line in f :
    #take care of the header
    if(line[0][0] == "@") :
        of.writerow(line)
        continue

    if(last_read == None) : 
        last_read = line
    else :
        if(last_read[0] == line[0]) :
            of.writerow(last_read)
            of.writerow(line)
            last_read = None
        else :
            last_read = line

3)

samtools view example.sam | grep YT:Z:CP | uniq -u | wc -l

or

samtools view -bS example.sam | grep YT:Z:CP | uniq -u | wc -l

and result(s): 12107870, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

4)

samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | uniq -u | wc -l

or samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | wc -l or samtools view -hf 0x2 example.bam | grep -v "XS:i:" | grep YT:Z:CP | wc -l

and result(s): 10159819, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.

I've tried almost every existing solution but to no avail. Urgently ask for your help.

RNA-seq pair-end concordantly bowtie2 alignment • 678 views
ADD COMMENT
6
Entering edit mode
2.2 years ago
Chilly ★ 1.3k

This might help. https://github.com/BenLangmead/bowtie2/issues/406

ADD COMMENT

Login before adding your answer.

Traffic: 1535 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6