Entering edit mode
6.4 years ago
QVINTVS_FABIVS_MAXIMVS
★
2.6k
Say I have SNP INDEL calls for 1000 individuals. These 1000 samples were joint-called and recalibrated with GATK in 10 batches.
As a result I have 10 VCF files with SNP and INDEL calls that I would like to merge. I only have access to the VCF files, so re-calling from the BAMs is not an option.
I'm familiar with bcftools
but I'm unclear on the best way forward.
Should I split multiallelic entries into biallelic before merging?
If I'm interested in rare variants, should I omit multiallelic variants?
Should I left align before merging? After merging? Or Both?
Thank you for any advice
I think your best bet would be to use GATK's merge SNPs option as I had issues merging them using vcftools/bcftools in the past. Possibly this: https://software.broadinstitute.org/gatk/documentation/tooldocs/3.8-0/org_broadinstitute_gatk_tools_walkers_variantutils_CombineVariants.php
Yes, I believe GATK's CombineVariants is the answer.