I'm parsing the output of bwa (0.7.15) and I'm getting some split read alignments that seem very strange. For example, take the following redacted record:
ReadName 2115 chrUn_JTFH01001478v1_decoy 12 1 250M chr21 10325621 0 * * NM:i:13 SA:Z:chr21,10325808,-,240M10S,9,19;
The read is aligned to both chr21 and the decoy contig. What I find strange is that although the read is reported as a split alignment with two SAM records, the entire read aligns to the supplementary alignment position.
What is the meaning of such a record? How can a genuinely split read contain an alignment that is essentially not split? Is this a bug in bwa? An artefact of alt contig mapping?
Edit: the SAM specifications have the following definition for the SA tag:
SA:Z:(rname ,pos ,strand ,CIGAR ,mapQ ,NM ;)+ Other canonical alignments in a chimeric alignment, formatted as a semicolon-delimited list. Each element in the list represents a part of the chimeric alignment. Conventionally, at a supplementary line, the first element points to the primary line.
The primary alignment is given in the SA tag: 19 mismatches (10 bases soft clipped), nominal mapq of 9.
The issue is that bwa is reporting it as a split read alignment using the
SA
tag, not an alternate alignment using theXA
tag. There's also 6 alternate alignments in the XA tag that I removed for clarity as they are different alignment possibilities, and do not form part of a single split alignment.One would expect split reads to have CIGARs something like "25M75S" on one record, and "25S75M" on the other because the aligner is reporting a split read alignment. In this case, the aligner is reporting a split read in which one of the alignments is not split at all.