For Illumina paired-end (100bp each end) exome sequencing:
Before taking the sample library to the sequencing machine, there is a step to test the length of templates in the library. The variance should not be too large to ensure a better result. The expected mode/average length should meet the needs imposed from the exome(the target) length as well as the questions the sequencing tries to ask.
Right?
So from the exome capture point of view, since we only got the 100 ends of the template, we don't hope the template is too long that the sequenced reads are less easy to cover the exome region. But if the template is too short, say 150bp (here and after, the length is calculated excluding the adapters and index etc, only that from the genome count), the two ends of 100 bp would overlap, which require extra effort to figure out after mapping.
Furthermore, for inferring the relatively large insertion/deletion, we expect certain gaps between the two ends of the template, which requires that the template length be not too short, like less than 250bp.
So, what length of library templates should be?
Hi, Irsan!
Thank you very much for your reply.
Yes, I mean "it can look like that a variant is supported in 2 independent reads while they actually come from the same fragment and this could result in bias". This is not hard to distinguish by writing additional programs even if no current tools take care of this. But I mean, there should be such a step, otherwise it might cause errors.
Sorry I don't quite understand what you mean by "The same will happen for base pairs that are not mutated right? I would say that if the DNA fragments (i.e. templates) are < 150 base pairs (maybe relevant for formalin-fixed paraffin embedded samples?) "
To summarize what I understand now: If not inferring large insertions or deletions, using pair-end in exome-seq for variant calling is indeed a waste. As long as using pair-end seq, if not leaving enough gap between the two ends, it still cannot used to infer the insertions and deletions, thus also waste.
On the other hand, unless exom target are much more easier to bound in the middle of a template, relatively long, say 300bp length template, would barely harm.
So, using 300bp template length as the library template length mode is appropriate.