Question

library template length in exome seqeuncing

0

Entering edit mode

10.5 years ago

wangyi2412 ▴ 250

For Illumina paired-end (100bp each end) exome sequencing:

Before taking the sample library to the sequencing machine, there is a step to test the length of templates in the library. The variance should not be too large to ensure a better result. The expected mode/average length should meet the needs imposed from the exome(the target) length as well as the questions the sequencing tries to ask.

Right?

So from the exome capture point of view, since we only got the 100 ends of the template, we don't hope the template is too long that the sequenced reads are less easy to cover the exome region. But if the template is too short, say 150bp (here and after, the length is calculated excluding the adapters and index etc, only that from the genome count), the two ends of 100 bp would overlap, which require extra effort to figure out after mapping.

Furthermore, for inferring the relatively large insertion/deletion, we expect certain gaps between the two ends of the template, which requires that the template length be not too short, like less than 250bp.

So, what length of library templates should be?

library template exome-seq • 2.5k views

ADD COMMENT • link updated 3.3 years ago by Ram 45k • written 10.5 years ago by wangyi2412 ▴ 250

Ram · Answer 1 · 2015-01-15

So from the exome capture point of view, since we only got the 100 ends of the template, we don't hope the template is too long that the sequenced reads are less easy to cover the exon region.

If the DNA is fragmented in say 5000 base pair pieces, you expect that the position of the exon you want to target (including the sequence that is targeted by the bait) can be anywhere in this fragment; left, in the middle or at the right. So when you sequence the ends of the DNA fragments, you can still get good coverage of the complete exon. In case that for some reason the exon ends up more often in the middle of the DNA fragments, or that the exome baits work better when the exon is in the middle of the fragment, it might be useful to fragment the DNA in smaller pieces indeed. The real problem with long DNA fragments is that the Illumina sample prep and sequencing technology is not able to sequence these long DNA fragments for various reasons (inefficient pull-down of baits/bridge amplification not efficient for long fragments/...).

But if the template is too short, say 150bp(here and after, the length is caculated excluding the adapters and index etc, only that from the genome count) , the two ends of 100 bp would overlap, which require extra effort to figure out after mapping.

What is exactly the problem when read pairs are overlapping? What has to be figure out after mapping? Do you mean when calling variants, it can look like that a variant is supported in 2 independent reads while they actually come from the same fragment and this could result in bias? The same will happen for base pairs that are not mutated right? I would say that if the DNA fragments (i.e. templates) are < 150 base pairs (maybe relevant for formalin-fixed paraffin embedded samples?) it is a waste of your expensive paired-end mode. Single-end sequencing would yield comparable quality for 50% of your money. So in case you choose single-end sequencing, you can double the sequencing depth which can increase the quality significantly.

Furthermore, for inferring the relatively large insertion/deletion, we expect certain gaps between the two ends of the template, which requires that the template length be not too short, like less than 250bp.

Yes I think this is particularly important for insertions