Question

having trouble extracting paired reads from bam file

0

Entering edit mode

5.1 years ago

from the mountains ▴ 250

i want to extract all paired reads from a bam file but for some reason i'm getting a much smaller file than expected.

$ samtools flagstat in.bam
56631731 + 0 in total (QC-passed reads + QC-failed reads)
0 + 0 secondary
417735 + 0 supplementary
0 + 0 duplicates
56631731 + 0 mapped (100.00% : N/A)
56213996 + 0 paired in sequencing
28112924 + 0 read1
28101072 + 0 read2
55820773 + 0 properly paired (99.30% : N/A)
56213996 + 0 with itself and mate mapped
0 + 0 singletons (0.00% : N/A)
380483 + 0 with mate mapped to a different chr
380483 + 0 with mate mapped to a different chr (mapQ>=5)

flagstat indicates to me that i should get 55820773 paired reads, or about 28M pairs of reads. Here is my command to extract:

$ samtools fastq -1 paired1.fastq.gz -2 paired2.fastq.gz -0 /dev/null -s /dev/null -n in.bam
$ zcat paired1.fastq.gz | wc -l
2885736

There are only about 700,000 reads in the first fastq file. I am trying to get rid of singletons. by the way the bam was generated by STAR, and subsequently sorted, indexed and filtered by samtools before using samtools fastq.

bam samtools alignment • 1.1k views

ADD COMMENT • link 5.1 years ago by from the mountains ▴ 250

score 2 · Accepted Answer · 2020-06-12

2

Entering edit mode

5.1 years ago

ATpoint 88k

You need to sort the bam by name or group by name (samtools sort -n or samtools collate) before writing as fastq.

ADD COMMENT • link 5.1 years ago by ATpoint 88k

0

Entering edit mode

that ended up being the solution. thanks for your expertise--this kind of thing always takes like 45 minutes of my time to figure out and somehow i never retain the information i've learned.

ADD REPLY • link 5.1 years ago by from the mountains ▴ 250

0

Entering edit mode

That is why biostars is here. If you don't do this regularly there is no way to retain this sort of info :-)

Accept the answer (green check mark) to provide closure to the thread.

ADD REPLY • link 5.1 years ago by GenoMax 152k

0

Entering edit mode

thanks, i did not know what that button was for!

ADD REPLY • link 5.1 years ago by from the mountains ▴ 250