value of TLEN not conforming to the SAM specifications
1
0
Entering edit mode
5.3 years ago

Hello to all,

I'm studying the SAM specifications, in particular the TLEN field (template length). I found an example of paired-end reads in the SAM file, where the value of TLEN does not conform to the specifications.

read1:
NB501050:47:HHMJVBGXY:1:11203:16016:13370   83  chrM    6622    70  136M    =   6624    -134    CACCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGT    MELNHFFFEGFGBF,FGEJG,FFGJCCIKFGDJGFAE1GECGFFJFIIKFCFGGHCGDJGCDJCDJFIBGHGGGGGGFGGDIKFGGGGHFCDKFCFGHCFGEHFJGHFHJCFGHFCFFKDFFGFHIFFII@FOOON    MD:Z:136

read2:
NB501050:47:HHMJVBGXY:1:11203:16016:13370   163 chrM    6624    70  136M    =   6622    134 CCCTGAAGTTTATATTCTTATCCTACCAGGCTTCGGAATAATCTCCCATATTGTAACTTACTACTCCGGAAAAAAAGAACCATTTGGATACATAGGTATGGTCTGAGCTATGATATCAATTGGCTTCCTAGGGTTT    NIONJEFICGFCECEFGEFCFHFFCHFGJIIFGHBIFGFCGFHFHGGGFCFGKDCGHFGCHFCHEGGBIFGGGGGGJFGHGGFGGKIFFCHGFCJID-KIDHFKAIIFCFKFFCFHGGFGKI=FFH;E,IHMIHH MD:Z:136

In this case, in fact, there seem to be 2 ambiguities: 1. [the value of TLEN should be 138, counting the 136 bases mapping plus the 2 bases of difference between the two paired-end reads: note that the 136 bases are all mappings being CIGAR = 136M] 2. [the read1 is the leftmost segment (POS = 6622), so the value of TLEN should be positive, instead of negative as in the example]

For point 2 I came to the conclusion that the read1 is not considered the leftmost segment because, being reverse (FLAG=83), its original mapping position (as in FASTQ) would be 6622 + 135 = 6757 (if it had been left reverse during the alignment). So, in effect, considering the original strand orientation, this read would be the rightmost segment, so with negative TLEN. However, if this is the explanation for the negative value of TLEN for read1, the SAM specifications would be unclear to my mind.

Gabriele

TLEN • 1.4k views
ADD COMMENT
0
Entering edit mode

what is the software (and version ) that produced such sam ?

ADD REPLY
0
Entering edit mode

GATK However I'm working directly on the SAM file, so I don't know the choices during alignment.

ADD REPLY
0
Entering edit mode
5.3 years ago
d-cameron ★ 2.9k

There are many, many SAM files where the content does not conform to the SAM specifications.

In the case of TLEN, the TLEN definition was changed in a specification update but many tools (including GATK) were not updated to reflect this. The spec themselves have recently changed to acknowledge that there are two competing definition in widespread usage (including GATK using the 'old' TLEN definition). See https://github.com/samtools/hts-specs/pull/366

Edit: this means the field itself is essentially useless as your assumptions will only be valid for the current version of your pipeline, or you're going to to recalculate it yourself using your definition of choice.

ADD COMMENT

Login before adding your answer.

Traffic: 1071 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6