Analysis of exome samples build from different exome kits
1
2
Entering edit mode
5.4 years ago

Question regarding germline variant calling from exome :

I've a dataset composed of ~500 exome samples build using 6 differents kits (the dataset was build since ~6-7 years, so as kits evolved the "youngest" samples where build using the more up-to-date kits and the oldest samples with the oldest kits.

As the targeted genomic regions are different for each exome kits which interval file should I use in GATK best practice ( BWA -> MarkDup -> BQSR -> HaplotyeCaller -> GenomicsDBimport -> GenotypeVCF -> VariantRecalibrator ) ? My first idea will be to use the union of all interval files (from each exon kit) but I'm wondering if VQSR part of GATK pipeline will not struggle as all samples will not fit the "union" interval set.

Other idea : for each kit call the variants using the associated samples and interval file. Merge the VCFs after VQSR filtering.

Any advice ? Thanks


Edit :

I open a thread on GATK's forum as it's really specific to this tool : https://gatkforums.broadinstitute.org/gatk/discussion/24168/strange-tranche-plot-after-gatk4-germline-snps-pipeline

In a nutshell, I succeed to improve TiTv by running each sample with it's respective interval file (from the corresponging exome kit) ; then used the union of these intervals for steps after Haplotyecaller (joint genotyping and VQSR)

exome gatk interval • 1.1k views
ADD COMMENT
2
Entering edit mode
5.4 years ago

My first inclination would be to make sure that all recommended preprocessing steps are performed with the respective kits from alignment up to gVCF generation. Once you genotype the gVCF files, try out the union or intersect of all kits (be careful with the intersect as some very old kits did not target a particularly large amount of sites. Check out the tranche files for each iteration.

ADD COMMENT
0
Entering edit mode

Thanks Andrew. I'll try that and post my results as soon as it's finished. Thanks

ADD REPLY

Login before adding your answer.

Traffic: 2665 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6