Extracting mapped paired end FastQ reads from BAM file using Samtools - trying to understand -f 0x02 vs -F 4
1
1
Entering edit mode
8.2 years ago
acxcrew007 ▴ 10

Hi Everyone,

I have been following a few great threads by fellow folks here. I am trying understand if someone here could kindly explain why using.

samtools view -b -f 0x2 accepted_hits.bam > mappedPairs.bam

is better than using

samtools view -b -F 4 accepted_hits.bam > mappedPairs.bam

I have always used the later command when extracting mapped reads in a bam file.

Extracting mapped reads based on a chromosomal region of interest:

samtools view -b -F 4 accepted_hits.bam Chr:position_x-position_y > mappedPairs.bam
samtools sort -n mappedPairs.bam sorted_mappedPairs.bam
bamToFastq -i sorted_mappedPairs.bam -fq forward.fastq -fq2 reverse.fastq

Going to run tests on the extracted fastq reads to see the region of interest.

Many thanks in advance.

RNA-Seq Fastq Bam samtools • 4.2k views
ADD COMMENT
0
Entering edit mode
ADD REPLY
1
Entering edit mode
8.2 years ago
John 13k

Using -f 2 on it's own is wrong. The second bit can be set (suggesting read is properly mapped in a pair) when the third bit is also set (read is unmapped). This is valid according to the SAM spec, which i highly suggest anyone working with SAM/BAM should read. In this scenario, read unmapped takes priority.

However, the BAM file format is really kind of bizarre for it's own reasons. Every entry represents an alignment, not a read. And alignment entries can be unaligned. A single biological DNA fragment can have multiple alignments from multiple different parts of the fragment. In short, things can become incredibly complicated incredibly quickly. Best to enforce a 1 alignment-per-read policy, and all reads have an alignment, even if it's unaligned. Once you've got your data in this format, which is essentially the GATK 'clean BAM' format, then you're not going to want to filter your reads ever again. You're going to want to use tools that read the whole file and only look at a given population of alignments. The reason this is important is that samtools view with -F or -f won't tell you which reads where removed and which reads were kept - it is essentially an operation on the data without any accountability. Your downstream tools however could tell you the reads it did not include in it's analysis, and this is a much better approach. So whatever you're doing, don't use -f/-F.

ADD COMMENT

Login before adding your answer.

Traffic: 4443 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6