tool to change flag for all reads in a bam file
1
0
Entering edit mode
8.1 years ago
cmo ▴ 90

Is there a tool which will edit the flag for all reads in a BAM file?

In my case: The GEM aligner sets the "paired" flag for all reads in the resulting BAM file.... even if you give it single-ended reads. This causes PICARD, GATK, etc. to throw many Warnings or Exit. Thus, I need a tool that will set the "paired" flag to 0 for all reads mapped by GEM.

BioAwk seems promising, but I can't see how it can set only the "paired" bit to 0 while leaving the other bits untouched.


Here are my GEM commands:

gem-indexer  -i ref.fa  -o ref.ix  -c dna  --complement no  --threads 1

gem-mapper  -I ref.ix.gem  -i illumina.fq   -q offset-33  -o gem.out  -m 2 -e 2  --max-big-indel-length 0 -d 1 --threads 1

gem-2-sam  -I ref.ix.gem  -i gem.out.map  -q offset-33  --threads 1    --expect-single-end-reads  -l  -o out.sam
sam bam • 3.9k views
ADD COMMENT
1
Entering edit mode

You might post your GEM command, since I suspect this is an incorrect flag setting somewhere.

BTW, you could write a bit of python code with pysam if you really need to.

ADD REPLY
0
Entering edit mode

I was going to recommend a kludgy combination of Samtools 'view' plus 'awk' to recalculate the values, but I suspect that someone with more binary-foo (probably Pierre or Heng Li) has an elegant solution.

Edit: See Devon's response (I should not have discounted his binary-foo!)

ADD REPLY
2
Entering edit mode

For what it's worth, FLAG & 3860 will produce the single-end equivalent flag (i.e., just keep bits 4, 16, and >=256).

ADD REPLY
0
Entering edit mode

updated to include GEM commands

ADD REPLY
0
Entering edit mode

Indeed, that shouldn't produce this sort of behaviour. It'd be nice if you reported this to the developers.

ADD REPLY
0
Entering edit mode

There are multiple flags associated with paired reads, so it may not be correct to simply subtract the paired flag (0x1). It depends upon how GEM assigns flags when read 2 is missing (e.g., does it also assign 0x8, mate unmapped?). It's probably necessary to run Samtools 'flagstat' to determine which paired flags are set in your dataset, before you attempt to edit them.

ADD REPLY
2
Entering edit mode
8.1 years ago
cmo ▴ 90

Indeed Heng Li's BioAwk does the job nicely:

samtools  view -H out.sam  >  header

samtools  view  out.sam    |     \
bioawk  -c sam  '{ $flag = and($flag , 3860 ) ; print $0 }'    |    \
cat  header  -    >  new.sam

Note that "3860" is the flag which sets all bits associated with mates/pairs to 0 and all other bits to 1
(as suggested by @Devon Ryan). Thus "AND" with "3860" forces all mate/pairs bits to 0 in the flag and leaves the other bits untouched.


To understand the flags, use Broad Institute's "Explain SAM Flags" page: https://broadinstitute.github.io/picard/explain-flags.html

ADD COMMENT

Login before adding your answer.

Traffic: 1802 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6