Hi,
I'm in the process of adding (optional) support for using alignments to our tool for RNA-seq quantification, and I'm currently trying to model the fragment size distribution of aligned fragments in a set of reads. I'm a bit confused about exactly where this information is stored in a bam/sam file. According to the spec, the SAM file has a TLEN field that gives the "template length" of an alignment, which seems to me like it would be equivalent to the fragment length (except in the case of e.g. a chimeric alignment). However, the API doesn't expose this field directly, and instead in SAM and BAM files parsed via samtools, you have access to an isize field, which, from various online sources seems to be the "insert size" of the aligned fragment.
My question is how do these two fields relate? Are they different? Is the insert size the fragment size, the mate inner distance (frag. size - read lengths), or something else entirely? Unfortunately, the documentation on this isn't too clear, so I'm reaching out to someone whose dealt with this in practice before.
Thanks!
Rob
I am really confused by this statement: "instead in SAM and BAM files parsed via samtools, you have access to an isize field,"
Where do you find
isize
in the specs? I see it for block compression but that's it.Sure. In the spec, they describe TLEN.; However, if you look at the Samtools API, the relevant information is in the bam1_core_t structure. Basically, I can't seem to find any mapping in the API for accessing the TLEN field, but a few people online seem to suggest that this is encoded in the
isize
field of thebam1_core_t
structure.