Hello to all,
I'm studying the SAM specifications, in particular the TLEN field (template length). I found an example of paired-end reads in the SAM file, where the value of TLEN does not conform to the specifications.
read1:
NB501050:47:HHMJVBGXY:1:11203:16016:13370 83 chrM 6622 70 136M = 6624 -134 CACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGT MELNHFFFEGFGBF,FGEJG,FFGJCCIKFGDJGFAE1GECGFFJFIIKFCFGGHCGDJGCDJCDJFIBGHGGGGGGFGGDIKFGGGGHFCDKFCFGHCFGEHFJGHFHJCFGHFCFFKDFFGFHIFFII@FOOON MD:Z:136
read2:
NB501050:47:HHMJVBGXY:1:11203:16016:13370 163 chrM 6624 70 136M = 6622 134 CCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTT NIONJEFICGFCECEFGEFCFHFFCHFGJIIFGHBIFGFCGFHFHGGGFCFGKDCGHFGCHFCHEGGBIFGGGGGGJFGHGGFGGKIFFCHGFCJID-KIDHFKAIIFCFKFFCFHGGFGKI=FFH;E,IHMIHH MD:Z:136
In this case, in fact, there seem to be 2 ambiguities: 1. [the value of TLEN should be 138, counting the 136 bases mapping plus the 2 bases of difference between the two paired-end reads: note that the 136 bases are all mappings being CIGAR = 136M] 2. [the read1 is the leftmost segment (POS = 6622), so the value of TLEN should be positive, instead of negative as in the example]
For point 2 I came to the conclusion that the read1 is not considered the leftmost segment because, being reverse (FLAG=83), its original mapping position (as in FASTQ) would be 6622 + 135 = 6757 (if it had been left reverse during the alignment). So, in effect, considering the original strand orientation, this read would be the rightmost segment, so with negative TLEN. However, if this is the explanation for the negative value of TLEN for read1, the SAM specifications would be unclear to my mind.
Gabriele
what is the software (and version ) that produced such sam ?
GATK However I'm working directly on the SAM file, so I don't know the choices during alignment.