The CIGAR '=' op for 'sequence match' in SAM - how and why?
1
1
Entering edit mode
9.5 years ago
amblina ▴ 140

Mostly a curiosity question but:

  1. Is the cigar '=' op kept for backwards compatibility or some specific use-case?
  2. Is there functionality within samtools for generating an altered cigar that distinguishes between matches and mismatches similar to calmd/fillmd -e?

The SAM v1 specification has a 'sequence match' op (=) within CIGAR field however, I haven't seen it used before and it isn't mentioned in the 2008 paper. I know that you can alter the SEQ field and replace exact matches with '=' using the -e option in samtools fillmd/calmd and that it would be relatively trivial to whip up something to do something like this manually. I've had a search around but haven't seen an example of it being used yet, though this may be due to the difficulty of searching for '=' online! Is the op '=' kept for backwards compatibility or a specific use-case? I vaguely remember someone saying on here (though I can't find a reference for it right now) that it was meant for read to genome comparisons - anyone have a reference for this?

sam snp samtools alignment next-gen • 3.0k views
ADD COMMENT
3
Entering edit mode
9.5 years ago

= and X are actually the newer CIGAR operations (they were added in version 1.4 of the spec., if I remember correctly), M is the older variant. Most tools still don't produce alignments with = or X operations, though it'd be kind of nice if they did. An exception to this is BBMap (and I believe STAR can be told to produce these).

There is no functionality from within samtools to modify the CIGAR string like this, though I suppose this might be nice. As is, you tend to have to parse the MD auxiliary tag if you want mismatch information, which is kind of annoying in comparison.

ADD COMMENT
0
Entering edit mode

Brill thanks, I'm glad I'm not just going mad or missing something obvious! It was definitely around in the v1.3 spec according to their git and svn repos. I don't know where to find earlier versions of the spec than that other than what's in the literature. Given that 1.3 was released in ~2010 it's kind of surprising that adjustments haven't been made since then. I mean, enough people ask about it on biostars / elsewhere!

ADD REPLY
0
Entering edit mode

The RTG map tools also produce alignments using = and X by default (--legacy-cigars can be used to give the old format), and if you have downstream tools that do not accept the newer operators, you can convert existing SAM/BAM files by using rtg sammerge --legacy-cigars in order to convert from new to old format.

ADD REPLY

Login before adding your answer.

Traffic: 1996 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6