I tried to extract reads-pairs aligned concordantly exactly 1 time from Bowtie2 (version 2.3.4.1 or high). This was fixed in Bowtie2's version a few years ago (the previous solution and another solution), but now the problem has resurfaced.
When I run:
bowtie2 -x sample.index -p 50 -t -1 sample_R1.trimmed.fastq -2 sample_R2.trimmed.fastq -S example.sam -k2 --no-discordant --no-mixed
The results would look like this:
Time loading reference: 00:00:00
Time loading forward index: 00:00:00
Time loading mirror index: 00:00:00
Multiseed full-index search: 00:00:35
20826672 reads; of these:
20826672 (100.00%) were paired; of these:
15275620 (73.35%) aligned concordantly 0 times
5048169 (24.24%) aligned concordantly exactly 1 time
502883 (2.41%) aligned concordantly >1 times
26.65% overall alignment rate
Time searching: 00:00:35
Overall time: 00:00:35
What I want to separate and extract is the '5048169 (24.24%) aligned concordantly exactly 1 time' reads-pairs.
But when I followed the previous workarounds, I found:
1)
samtools view -hf 0x2 example.sam | grep -v "XS:i:" | wc -l
or
samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | wc -l
and result(s): 10196512
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.
2)
samtools view -hf 0x2 example.sam | grep -v "XS:i:" | foo.py | wc -l
or
samtools view -bS example.sam > example.bam; samtools view -hf 0x2 example.bam | grep -v "XS:i:" | foo.py | wc -l
and result(s): 10193259
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.
foo.py
is (from Mr. Devon Ryan's solution):
#!/usr/bin/env python
import csv
import sys
f = csv.reader(sys.stdin, dialect="excel-tab")
of = csv.writer(sys.stdout, dialect="excel-tab")
last_read = None
for line in f :
#take care of the header
if(line[0][0] == "@") :
of.writerow(line)
continue
if(last_read == None) :
last_read = line
else :
if(last_read[0] == line[0]) :
of.writerow(last_read)
of.writerow(line)
last_read = None
else :
last_read = line
3)
samtools view example.sam | grep YT:Z:CP | uniq -u | wc -l
or
samtools view -bS example.sam | grep YT:Z:CP | uniq -u | wc -l
and result(s): 12107870
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.
4)
samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | uniq -u | wc -l
or samtools view -hf 0x2 example.sam | grep -v "XS:i:" | grep YT:Z:CP | wc -l or samtools view -hf 0x2 example.bam | grep -v "XS:i:" | grep YT:Z:CP | wc -l
and result(s): 10159819
, which is not equal to 5,048,169*2 = 10,096,338 reads that I want.
I've tried almost every existing solution but to no avail. Urgently ask for your help.