unmapped reads in paired-end run pf bowtie2
1
0
Entering edit mode
6.4 years ago
Assa Yeroslaviz ★ 1.9k

I have a data set of paired-end samples which I'm mapping with bowtie2.

I am mainly focus on the unmapped reads, as these are the reads we're interested in.

This is a sample output of the mapping results from on of my runs:

11216394 reads; of these:
11216394 (100.00%) were paired; of these:
8079466 (72.03%) aligned concordantly 0 times
2987422 (26.63%) aligned concordantly exactly 1 time
149506 (1.33%) aligned concordantly >1 times
----
8079466 pairs aligned concordantly 0 times; of these:
  417642 (5.17%) aligned discordantly 1 time
----
7661824 pairs aligned 0 times concordantly or discordantly; of these:
  15323648 mates make up the pairs; of these:
    15226093 (99.36%) aligned 0 times
    63924 (0.42%) aligned exactly 1 time
    33631 (0.22%) aligned >1 times
32.13% overall alignment rate

Out of this output, how many unmapped reads do I have?

From my understanding I was thinking it is the fourth row from the bottom = 15226093?

If I calculate all three options in this summary i have aligned once - (29874222) + (417642 *2) + 63924 = 6874052 aligned >1 - (1495062) + 33631 = 332643 unaligned - 15226093 sum = 6874052 + 332643 + 15226093 = 22432788

This last number is the sum of my input reads of this sample.

But When I look at the number of reads in the unmapped.fastq file after the mapping it corresponds to the third row of the file (x2) = 8079466 x2 = 16158932?

But than adding the mapped and unmapped reads together give me more reads, than the there were in total.

Can anyone help me understand what is happening here?

How do I calculate the number of unmapped reads in a paired-end mapped sample?

thanks Assa

paired-end bowtie unmapped-reads • 5.7k views
ADD COMMENT
0
Entering edit mode

What was the exact command you used? There are a few options for outputting various subsets of unmapped reads, it's likely you used the wrong one.

ADD REPLY
0
Entering edit mode

This is the commands i use

bowtie2 --sensitive --end-to-end -p 10 --un-conc-gz firstRun.Unmapped.gz --phred33 -x genome -1 1.fq.gz -2 2.fq.gz | samtools view -Sbhu -F 4 - | samtools sort - -o sorted.bam
samtools index sorted.bam

The file I need in the next step is firstRun.Unmapped.gz.

I didn't subset the bam file with samtools, I thought the unmapped reads are outputted to this file.

ADD REPLY
0
Entering edit mode

Just to add - this is the output of the same file when running the samtools flagstat command

7206695 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
7206695 + 0 mapped (100.00% : N/A)
7206695 + 0 paired in sequencing
3615165 + 0 read1
3591530 + 0 read2
6273856 + 0 properly paired (87.06% : N/A)
7131084 + 0 with itself and mate mapped
75611 + 0 singletons (1.05% : N/A)
3574 + 0 with mate mapped to a different chr
719 + 0 with mate mapped to a different chr (mapQ>=5)

I can't see how the numbers of the stat (bowtie2 output) file and the flagstat output correlate.

ADD REPLY
1
Entering edit mode
6.4 years ago

The discrepency you're seeing is due to using --un-conc-gz, which is writing the 8079466 pairs that didn't align concordantly to the specified fastq files. You'll instead want --un-gz instead.

ADD COMMENT
0
Entering edit mode

I thought so too. But if I use the same command but with --un-gz and the input files -1 1.fq.gz -2 2.fq.gz, the unmapped file is empty.

Is there a way to map the files in a paired-end modus and still get the unmapped reads, not pairs?

ADD REPLY
0
Entering edit mode

Then I'd suggest not writing them to a file with bowtie2 but instead keeping them in the SAM output and making fastq files from that (samtools fastq -f 4).

ADD REPLY
0
Entering edit mode

thanks, I'll look into it

ADD REPLY

Login before adding your answer.

Traffic: 2481 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6