I aligned 150nt reads to the human mitochondrion NC_012920.1 using bowtie2 with default parameters. Numerous reads have a CIGAR value of '*' and also no not have the 'NM' parameter eg:
Interpretable read:
8959166 153 NC_012920.1 14872 44 150M = 14872 0 CCTCCAAATCACCACAGGACTATTCCTAGCCATGCACTACTCACCAGACGCCTCAACCGCCTTTTCATCAATCGCCCACATCACTCGAGACGTAAATTATGGCTGAATCATCCGCTACCTTCACGCCAATGGCGCCTCAATATTCTTTAT FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF,FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF MD:Z:150 PG:Z:MarkDuplicatesXG:i:0 NM:i:0 XM:i:0 XN:i:0 XO:i:0 AS:i:300 YT:Z:UP
Uninterpretable read:
8223405 101 NC_012920.1 15512 0 * = 15512 0 CATCAAGGTTAGGGTGGAAGAAGAAATTTCTTATATGGGTCCGATAAAATCACCTTCCACCCTTACTACACAATCAAAGACGCCCTCGGCTTACTTCTCTTCCTTCTCTCCTTAATGACATTAACACTATTCTCACCTGACCTCCTAGGA FFFFFFFF,F,FFFFFFFF:FFF:FFF::F:FFFFFFFFFFFFFFFF:FFFF,,FFFFF:FFFFFFFFFFFFFFFFFFFFF,F,FFFFF,F:FFFF,,,,,:FF:FF,FFFFFFFFFFFFFFFFFFFF:FFFFFF:F,FF,FF:FF,,F, PG:Z:MarkDuplicates YT:Z:UP
Looking at the sam format documentation there is no clear explanation of a '' for CIGAR, just mentions 'set for unavailable'
When I run BLAST for the 'uninterpreable read' I get a 109/110nt mach, indicating soft clippling at one or both ends and just a single nt match after soft clip.
Any idea why bowtie2 is recording these reads with a CIGAR of '*'?