The concept of template is somewhat obscure - and I for one always have a nagging feeling that I could be missing/misunderstanding something about it. Let's start with the definition:
Template: A DNA/RNA sequence part of which is sequenced on a sequencing machine or assembled from raw sequences.
I think this is far from being an optimal. In that single sentence there are a lots of undefined concepts that don't help at all: sequencing machines, sequencing process, raw sequences and assembly. Those just muddy the water. In my mind the correct definition would be:
Template: A DNA/RNA sequence from which one or more parts will be represented in the file.
Then let's move on, the SAM spec says:
Read: A raw sequence that comes off a sequencing machine. A read may consist of multiple segments. For sequencing data, reads are indexed by the order in which they are sequenced.
I think this is could do with clarifications as well. What happens here is that concept of segment is conflated with the concept of alignment. A read does not actually consist of multiple segments. The full sequence of a read read may be aligned to produce different locally aligned regions and the sequence for these aligned regions will be called a segment. The key here is that a read will not a-priory consist of segments as stated above! The presence or absence of segments will depend solely on the aligner.
Now what does not help is that even though the segments were defined relative to a read in the rest of the spec segments are almost always discussed relative to the template: for example "next segment in the template". This can make reading the spec very confusing.
My mental model hierarchy is the following:
- Template --> The DNA fragment that was measured
- Reads --> Depending on the methodology a template may produce one or more reads. These reads may cover the entire template or just a subsection of it. Reads originating from the same template typically cover different parts of the template, and, may represent the template itself or the reverse complement of it.
- Segments --> Each read may produce one or more alignments that in turn will have aligned regions called segments. From these segments it may be possible the infer the size of the original template.
PS: I do realize that it is quite difficult to write unambiguous specification that also makes sense, plus the spec was done many years ago without the benefit of hindsight