Can You Make A Bam File Picard Compatible? I Forgot -M With Bwa Mem
2
3
Entering edit mode
11.4 years ago
ajc8 ▴ 120

Hi,

I just aligned several exome samples with bwa mem and then processed the output through several pipeline steps. This is the first time I used bwa mem instead of aln and then sampe.

I am now ready to Mark Duplicates in Picard but after determining the source of my error message, I have realized that in order to prepare the sam file for future use in Picard (via bwa mem), I need to specify with "-M" in the mem command.

Is there a way to make a bam file okay for marking duplicates in Picard without realigning from the beginning?

Thanks for your help.
Allison

bwa picard duplicates • 6.9k views
ADD COMMENT
0
Entering edit mode

What's your error message?

ADD REPLY
4
Entering edit mode
11.4 years ago
matted 7.8k

Yes, you can.

As I understand it, the issue is a new flag 0x800 that is set for split alignment records, used instead of 0x100 for a secondary alignment. Neither Picard nor Samtools understand this flag (yet).

From Heng on the samtools-help mailing list:

Those on the samtools mailing list may know I have proposed to a new SAM flag 0x800 to better describe chimeric alignment. I have also proposed to standardize the XP tag as SA. The format of SA follows: "(chr,pos,strand,CIGAR,mapQ,NM;)+". Note that SA separates position and strand, slightly different from XP. Other samtools developers, including the Picard group, have seconded the changes. I will write this to the SAM spec.

The latest bwa-mem at github implements these changes. In the new output (without option -a), a read may appear in two or more SAM lines as before. But in this case, one and only one line is NOT flagged with 0x800. This line is called the "primary line" and always uses soft clipping. The rest of lines are flagged with 0x800. These lines are called "supplementary lines" and always use hard clipping. Having one primary line helps operations such as MarkDuplicates, SamToFastq and FixMateInformation.

Samtools ignores the new flag. Picard may not work with the new bwa-mem output, but it is going to. Before Picard supports the new 0x800 flag, you may still use flag "-M" as before. The only effect of "-M" is to change 0x800 to 0x100. You may also change 0x800 to 0x100 with a script if you need the compatibility with older Picard but forget to use "-M" when invoking bwa-mem.

So I think the easiest fix is to write a short script going through the file in sam format and changing the 0x800 flag to 0x100, if it's set. Or even simpler (though lossy), you could exclude reads that have the 0x800 flag set, producing a file that's compatible with current Picard tools.

Or I guess (for completeness) you could patch a local version of Picard to check for 0x800 as well as 0x100...

ADD COMMENT
2
Entering edit mode
8.7 years ago

I had to do this today, and wrote a small bit of python to convert my existing alignments:

ADD COMMENT

Login before adding your answer.

Traffic: 1505 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6