Is there a tool which will edit the flag for all reads in a BAM file?
In my case: The GEM aligner sets the "paired" flag for all reads in the resulting BAM file.... even if you give it single-ended reads. This causes PICARD, GATK, etc. to throw many Warnings or Exit. Thus, I need a tool that will set the "paired" flag to 0 for all reads mapped by GEM.
BioAwk seems promising, but I can't see how it can set only the "paired" bit to 0 while leaving the other bits untouched.
Here are my GEM commands:
gem-indexer -i ref.fa -o ref.ix -c dna --complement no --threads 1
gem-mapper -I ref.ix.gem -i illumina.fq -q offset-33 -o gem.out -m 2 -e 2 --max-big-indel-length 0 -d 1 --threads 1
gem-2-sam -I ref.ix.gem -i gem.out.map -q offset-33 --threads 1 --expect-single-end-reads -l -o out.sam
You might post your GEM command, since I suspect this is an incorrect flag setting somewhere.
BTW, you could write a bit of python code with pysam if you really need to.
I was going to recommend a kludgy combination of Samtools 'view' plus 'awk' to recalculate the values, but I suspect that someone with more binary-foo (probably Pierre or Heng Li) has an elegant solution.
Edit: See Devon's response (I should not have discounted his binary-foo!)
For what it's worth,
FLAG & 3860
will produce the single-end equivalent flag (i.e., just keep bits 4, 16, and >=256).updated to include GEM commands
Indeed, that shouldn't produce this sort of behaviour. It'd be nice if you reported this to the developers.
There are multiple flags associated with paired reads, so it may not be correct to simply subtract the paired flag (0x1). It depends upon how GEM assigns flags when read 2 is missing (e.g., does it also assign 0x8, mate unmapped?). It's probably necessary to run Samtools 'flagstat' to determine which paired flags are set in your dataset, before you attempt to edit them.