Question

what should be taken care of during the bioinformatic analysis for targeted resequencing, compared with whole genome resequencing?

0

Entering edit mode

7.4 years ago

Zhilong Jia ★ 2.2k

what should be taken care of during the bioinformatic analysis pipeline for targeted resequencing, compared with whole genome resequencing? Esp. considering the difference of data between targeted resequencing and whole genome resequencing, What can be done to optimise the bioinformatics pipeline for the targeted reseq. For example, can the use of parameter --intervals in GATK will improve the performance? Thank you.

sequencing targeted resequencing wgs • 2.4k views

ADD COMMENT • link updated 7.3 years ago by finswimmer 16k • written 7.4 years ago by Zhilong Jia ★ 2.2k

0

Entering edit mode

Hi, Zhilong,

are you targeting whole exome or a gene panel? Are you trying to identify common SNPs or rare variants?

Info on genome vs exome bioinformatic analysis can be found here

I worked a bit on detection on rare variants and did some research that ended in a review, you can find it here

Hope this is a good starting point!

ADD REPLY • link 7.4 years ago by Fabio Marroni ★ 3.0k

0

Entering edit mode

Hello,

I would be interested in the other way round, as I did only target resequencing until now. Why should there be any differences in the bioinformatics pipeline?

fin swimmer

ADD REPLY • link 7.3 years ago by finswimmer 16k

1

Entering edit mode

You are right, I forgot to mention that when I did the targeted resequencing for rare variants I was using POOLED data. If you have individual sequence data, then the pipeline will not differ. Actually, if possible you should align to the whole genome anyway, so that you avoid spurious alignments to your regions of off-target reads that may have been generated. Also, several people filter polymorphic positions based on coverage (i.e. positions with very high or very low coverage are discarded). This is fine, but when you have targeted resequencing you have to keep into account the fact that you might have greater variability in coverage, and thus allow larger deviations (the numbers really depend on your data). I am sorry for the confusion I might have generated with my previous answer!!

ADD REPLY • link 7.3 years ago by Fabio Marroni ★ 3.0k

score 0 · Answer 1 · 2017-09-03

0

Entering edit mode

7.3 years ago

finswimmer 16k

Hello,

For example, can the use of parameter --intervals in GATK will improve the performance?

defining your region of interesed is one of the fundamentals in targeted resequencing. Of course restricting the variant calling to a specific region should be much faster than doing it for the whole genome.

fin swimmer

ADD COMMENT • link 7.3 years ago by finswimmer 16k

0

Entering edit mode

without --intervals, what will be different except the time-consumption? Thank you.

ADD REPLY • link 7.3 years ago by Zhilong Jia ★ 2.2k

1

Entering edit mode

You will get much more variants and have to filter those out in your pipeline later. In targeted resequencing you always have a bunch of off-target reads.

ADD REPLY • link 7.3 years ago by finswimmer 16k

0

Entering edit mode

As @Fabio indicated, "Actually, if possible you should align to the whole genome anyway, so that you avoid spurious alignments to your regions of off-target reads that may have been generated.", in which specific step, the --intervals should be used? Thank you.

ADD REPLY • link 7.3 years ago by Zhilong Jia ★ 2.2k

1

Entering edit mode

You should use --intervals when you do the variant calling.

ADD REPLY • link 7.3 years ago by finswimmer 16k

0

Entering edit mode

What if I know that due to the imperfect custom amplicon panel design some of primer pairs are expected to bind to off-target genome regions? Should I exclude such ambigous regions of interest from the intervals list?

ADD REPLY • link 7.3 years ago by lamteva.vera ▴ 220