library template length in exome seqeuncing
1
0
Entering edit mode
9.9 years ago
wangyi2412 ▴ 240

For Illumina paired-end (100bp each end) exome sequencing:

Before taking the sample library to the sequencing machine, there is a step to test the length of templates in the library. The variance should not be too large to ensure a better result. The expected mode/average length should meet the needs imposed from the exome(the target) length as well as the questions the sequencing tries to ask.

Right?

So from the exome capture point of view, since we only got the 100 ends of the template, we don't hope the template is too long that the sequenced reads are less easy to cover the exome region. But if the template is too short, say 150bp (here and after, the length is calculated excluding the adapters and index etc, only that from the genome count), the two ends of 100 bp would overlap, which require extra effort to figure out after mapping.

Furthermore, for inferring the relatively large insertion/deletion, we expect certain gaps between the two ends of the template, which requires that the template length be not too short, like less than 250bp.

So, what length of library templates should be?

library template exome-seq • 2.2k views
ADD COMMENT
0
Entering edit mode
9.9 years ago
Irsan ★ 7.8k

So from the exome capture point of view, since we only got the 100 ends of the template, we don't hope the template is too long that the sequenced reads are less easy to cover the exon region.

If the DNA is fragmented in say 5000 base pair pieces, you expect that the position of the exon you want to target (including the sequence that is targeted by the bait) can be anywhere in this fragment; left, in the middle or at the right. So when you sequence the ends of the DNA fragments, you can still get good coverage of the complete exon. In case that for some reason the exon ends up more often in the middle of the DNA fragments, or that the exome baits work better when the exon is in the middle of the fragment, it might be useful to fragment the DNA in smaller pieces indeed. The real problem with long DNA fragments is that the Illumina sample prep and sequencing technology is not able to sequence these long DNA fragments for various reasons (inefficient pull-down of baits/bridge amplification not efficient for long fragments/...).

But if the template is too short, say 150bp(here and after, the length is caculated excluding the adapters and index etc, only that from the genome count) , the two ends of 100 bp would overlap, which require extra effort to figure out after mapping.

What is exactly the problem when read pairs are overlapping? What has to be figure out after mapping? Do you mean when calling variants, it can look like that a variant is supported in 2 independent reads while they actually come from the same fragment and this could result in bias? The same will happen for base pairs that are not mutated right? I would say that if the DNA fragments (i.e. templates) are < 150 base pairs (maybe relevant for formalin-fixed paraffin embedded samples?) it is a waste of your expensive paired-end mode. Single-end sequencing would yield comparable quality for 50% of your money. So in case you choose single-end sequencing, you can double the sequencing depth which can increase the quality significantly.

Furthermore, for inferring the relatively large insertion/deletion, we expect certain gaps between the two ends of the template, which requires that the template length be not too short, like less than 250bp.

Yes I think this is particularly important for insertions

ADD COMMENT
0
Entering edit mode

Hi, Irsan!

Thank you very much for your reply.

Yes, I mean "it can look like that a variant is supported in 2 independent reads while they actually come from the same fragment and this could result in bias". This is not hard to distinguish by writing additional programs even if no current tools take care of this. But I mean, there should be such a step, otherwise it might cause errors.

Sorry I don't quite understand what you mean by "The same will happen for base pairs that are not mutated right? I would say that if the DNA fragments (i.e. templates) are < 150 base pairs (maybe relevant for formalin-fixed paraffin embedded samples?) "

To summarize what I understand now: If not inferring large insertions or deletions, using pair-end in exome-seq for variant calling is indeed a waste. As long as using pair-end seq, if not leaving enough gap between the two ends, it still cannot used to infer the insertions and deletions, thus also waste.

On the other hand, unless exom target are much more easier to bound in the middle of a template, relatively long, say 300bp length template, would barely harm.

So, using 300bp template length as the library template length mode is appropriate.

ADD REPLY

Login before adding your answer.

Traffic: 1796 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6