Mostly a curiosity question but:
- Is the cigar '=' op kept for backwards compatibility or some specific use-case?
- Is there functionality within samtools for generating an altered cigar that distinguishes between matches and mismatches similar to calmd/fillmd -e?
The SAM v1 specification has a 'sequence match' op (=) within CIGAR field however, I haven't seen it used before and it isn't mentioned in the 2008 paper. I know that you can alter the SEQ field and replace exact matches with '=' using the -e
option in samtools fillmd/calmd and that it would be relatively trivial to whip up something to do something like this manually. I've had a search around but haven't seen an example of it being used yet, though this may be due to the difficulty of searching for '=' online! Is the op '=' kept for backwards compatibility or a specific use-case? I vaguely remember someone saying on here (though I can't find a reference for it right now) that it was meant for read to genome comparisons - anyone have a reference for this?
Brill thanks, I'm glad I'm not just going mad or missing something obvious! It was definitely around in the v1.3 spec according to their git and svn repos. I don't know where to find earlier versions of the spec than that other than what's in the literature. Given that 1.3 was released in ~2010 it's kind of surprising that adjustments haven't been made since then. I mean, enough people ask about it on biostars / elsewhere!
The RTG map tools also produce alignments using
=
andX
by default (--legacy-cigars
can be used to give the old format), and if you have downstream tools that do not accept the newer operators, you can convert existing SAM/BAM files by usingrtg sammerge --legacy-cigars
in order to convert from new to old format.