When genotyping Whole Exam Sequencing data for a large human cohort with the ultimate intention of calling SNPS/InDels, how important is using sample specific design files when recalibrating bams? I am wondering because I am genotyping the cohort that used 3 capture kit versions: v1, v2, and v2 extended - where the extended version is larger than the original by 23,399 intervals - but the authors did not make it clear which samples used which capture kit.
I was wondering if I could just use v2 extended for all the samples, use the union of all 3 capture kits, or use the intersection of all 3 capture kits. Most importantly, why? I know that recalibrating with the design file prevents reads to be mapped to the incorrect genomic region, but how much of an effect is it if one uses the incorrect, but similar, design file.
This is especially since, theoretically, when filtering joint called files the additional intervals sequenced by the later versions would be lost given that only the intersection of all the capture kits would be used to filter and yield final genotyped files.