Dear all,
We have a large exome sequencing (>8000) cohort and recently processed them. Now at the end of the line there are some weird results for a few genes. So I would like to recreate some of the final vcf files (all samples combined). However doing this for all samples will take really long and requires high computational power. I was wondering would there be any objections or biases introduced by extracting just some genes of interest (like 50) from each gvcf and then continue with those subsetted gvcf files to speed everything up? Or does GATK steps require the whole gvcf present? Any other recommendations for in between steps are welcome :)