Question

unmapped reads in paired-end run pf bowtie2

0

Entering edit mode

6.4 years ago

Assa Yeroslaviz ★ 1.9k

I have a data set of paired-end samples which I'm mapping with bowtie2.

I am mainly focus on the unmapped reads, as these are the reads we're interested in.

This is a sample output of the mapping results from on of my runs:

11216394 reads; of these:
11216394 (100.00%) were paired; of these:
8079466 (72.03%) aligned concordantly 0 times
2987422 (26.63%) aligned concordantly exactly 1 time
149506 (1.33%) aligned concordantly >1 times
----
8079466 pairs aligned concordantly 0 times; of these:
  417642 (5.17%) aligned discordantly 1 time
----
7661824 pairs aligned 0 times concordantly or discordantly; of these:
  15323648 mates make up the pairs; of these:
    15226093 (99.36%) aligned 0 times
    63924 (0.42%) aligned exactly 1 time
    33631 (0.22%) aligned >1 times
32.13% overall alignment rate

Out of this output, how many unmapped reads do I have?

From my understanding I was thinking it is the fourth row from the bottom = 15226093?

If I calculate all three options in this summary i have aligned once - (29874222) + (417642 *2) + 63924 = 6874052 aligned >1 - (1495062) + 33631 = 332643 unaligned - 15226093 sum = 6874052 + 332643 + 15226093 = 22432788

This last number is the sum of my input reads of this sample.

But When I look at the number of reads in the unmapped.fastq file after the mapping it corresponds to the third row of the file (x2) = 8079466 x2 = 16158932?

But than adding the mapped and unmapped reads together give me more reads, than the there were in total.

Can anyone help me understand what is happening here?

How do I calculate the number of unmapped reads in a paired-end mapped sample?

thanks Assa

paired-end bowtie unmapped-reads • 5.7k views

ADD COMMENT • link updated 6.4 years ago by Devon Ryan 104k • written 6.4 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

What was the exact command you used? There are a few options for outputting various subsets of unmapped reads, it's likely you used the wrong one.

ADD REPLY • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

This is the commands i use

bowtie2 --sensitive --end-to-end -p 10 --un-conc-gz firstRun.Unmapped.gz --phred33 -x genome -1 1.fq.gz -2 2.fq.gz | samtools view -Sbhu -F 4 - | samtools sort - -o sorted.bam
samtools index sorted.bam

The file I need in the next step is firstRun.Unmapped.gz.

I didn't subset the bam file with samtools, I thought the unmapped reads are outputted to this file.

ADD REPLY • link 6.4 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Just to add - this is the output of the same file when running the samtools flagstat command

7206695 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
0 + 0 supplementary
0 + 0 duplicates
7206695 + 0 mapped (100.00% : N/A)
7206695 + 0 paired in sequencing
3615165 + 0 read1
3591530 + 0 read2
6273856 + 0 properly paired (87.06% : N/A)
7131084 + 0 with itself and mate mapped
75611 + 0 singletons (1.05% : N/A)
3574 + 0 with mate mapped to a different chr
719 + 0 with mate mapped to a different chr (mapQ>=5)

I can't see how the numbers of the stat (bowtie2 output) file and the flagstat output correlate.

ADD REPLY • link 6.4 years ago by Assa Yeroslaviz ★ 1.9k

score 1 · Accepted Answer · 2018-06-25

1

Entering edit mode

6.4 years ago

Devon Ryan 104k

The discrepency you're seeing is due to using --un-conc-gz, which is writing the 8079466 pairs that didn't align concordantly to the specified fastq files. You'll instead want --un-gz instead.

ADD COMMENT • link 6.4 years ago by Devon Ryan 104k

0

Entering edit mode

I thought so too. But if I use the same command but with --un-gz and the input files -1 1.fq.gz -2 2.fq.gz, the unmapped file is empty.

Is there a way to map the files in a paired-end modus and still get the unmapped reads, not pairs?

ADD REPLY • link 6.4 years ago by Assa Yeroslaviz ★ 1.9k

0

Entering edit mode

Then I'd suggest not writing them to a file with bowtie2 but instead keeping them in the SAM output and making fastq files from that (samtools fastq -f 4).