pysam difference between template_length and reference_length
1
0
Entering edit mode
6.0 years ago
Medhat 9.8k

I was trying to identify the insert size between paired reads. When using pysam template_length will do the work, I was confused with reference_length

aligned length of the read on the reference genome. This is equal to aend - pos. Returns None if not available.

I thought that either they will be same value or close, but this is what I got:

template_length    reference_length
6    128
6    148
920  148
920  147
151  148
151 148
74  74
925  148

why some reads have big difference?

Thanks,

pysam genome next-gen • 5.4k views
ADD COMMENT
0
Entering edit mode

I think the difference is in in softclips: your template is longer than what is aligned to the reference there. Could that explain what you see?

ADD REPLY
0
Entering edit mode
26S18M6D104M     6   128
57M4I75M4D12M    6   148
148M     920     148
36M1I111M    920     147
148M     151     148
148M     151     148
74M74S   74      74
148M     925     148

Actually I see both, some times template shorter sometimes the other way around.

ADD REPLY
0
Entering edit mode

Maybe this related discussion helps? Fragment Size: TLEN vs. isize

ADD REPLY
0
Entering edit mode

Both tlen and isize deprecated now using template_length . but this have no relation with reference_length which is equals to reference_end - reference_start

ADD REPLY
0
Entering edit mode

TLEN is template length, no? Also are your aligned reads spliced or not?

ADD REPLY
0
Entering edit mode

Yes Tlen is template_length.

tlen deprecated, use template_length instead

No reads are not spliced

ADD REPLY
1
Entering edit mode
6.0 years ago

The reference_length is just the number of bases in a given read that are mapped. This completely ignores any mates and how whatever the aligner is deciding to call the length of the sequenced fragment (i.e., the TLEN field in the SAM/BAM/CRAM file). The template_length is then whatever the TLEN value is in the SAM/BAM/CRAM file. For your example where the values are similar you have cases where the mates are overlapping. For cases where the TLEN is much smaller, it's likely that the reads overlap in a way that the aligner didn't expect:

  <---------------- R1
                -----------------> R2

For the third case, there's just a larger fragment size.

Take for example the following fake reads:

@HD     VN:1.4  SO:coordinate
@SQ     SN:1    LN:195471971
r1      97      1       1       255     50M     =       100     150     *       *
r1      145     1       100     255     50M     =       1       -150    *       *
r2      145     1       200     255     50M     =       245     10      *       *
r2      97      1       245     255     50M     =       200     -10     *       *
r3      97      1       300     255     50M     =       500     250     *       *
r3      145     1       500     255     50M     =       300     -250    *       *

Then in python:

>>> import pysam
>>> bam = pysam.AlignmentFile("foo.bam")
>>> bam = pysam.AlignmentFile("foo.bam")
>>> for b in bam:
...     print("{}\t{}".format(b.template_length, b.reference_length))
... 
150 50
-150    50
10  50
-10 50
250 50
-250    50
ADD COMMENT
0
Entering edit mode

Thanks, It is clear now. Same topic but different issue, So If I want to calculate the insert size to be aligner independent; It will be something like: read.next_reference_start - read.reference_end ? "assuming that the reads aligned in the right direction"

ADD REPLY
1
Entering edit mode

If by insert size (it's a meaningless term in my opinion, since that's not really an insert) you mean the part of each fragment not sequenced then yes.

ADD REPLY
0
Entering edit mode

Thanks for the information, but that will lead to this question, What is a meaning insert size? shall I add query_alignment_length to the equation? what is the right way to calculate it?

ADD REPLY
1
Entering edit mode

It depends wholly on what you mean. I only think in terms of fragment lengths, which is the template_length in most cases.

ADD REPLY

Login before adding your answer.

Traffic: 1928 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6