Is there an aligner/mapper that lets you choose whether the CIGAR strings in the SAM/BAM output should have the newer format (including symbols "=" and "X") or the legacy format (only "M")?
Is there an aligner/mapper that lets you choose whether the CIGAR strings in the SAM/BAM output should have the newer format (including symbols "=" and "X") or the legacy format (only "M")?
I wrote a tool to convert the 'M' to 'X' or '=' https://github.com/lindenb/jvarkit/wiki/SamFixCigar
BBMap outputs the new format by default, but can generate the older format by adding 'sam=1.3' flag.
The lastz aligner can do that, see docs for details: ‑‑format=general:cigarx
Istvan Albert Secondary question. Do you know if you can get LastZ to produce a SAM that has cigarX formatting?
For the sake of exercise I wrote a converter, similar to Pierre's I guess. It's here ExpandCigar.jar and source code here. It uses the MD tag if present. If not you can add MD with samtools calmd and pipe the ouptut to ExpandCigar.jar.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
I think STAR has that option and I vaguely recall that BBMap can as well.
Not sure about the CIGAR string but if your overall goal is to be able to find the position of mismatch then you can use samtools calmd to generate MD tags that represent string for mismatching positions. Here is a text that I have copied from SAM format: The MD field aims to achieve SNP/indel calling without looking at the reference. SOAP and Eland SNP callers prefer such information. For example, a string "10A5^AC6" means from the leftmost reference base in the alignment, there are 10 matches followed by an A on the reference which is different from the aligned read base; the next 5 reference bases are matches followed by a 2bp deletion from the reference; the deleted sequence is AC; the last 6 bases are matches. The MD field should match the CIGAR string, although an SAM parser may not check this optional field. Here is the reference: http://chagall.med.cornell.edu/NGScourse/SAM.pdf