Entering edit mode
11 months ago
Michal Frenkel
▴
30
I have approximately 100 VCF files, all of which are also available in the .g.vcf format. What is the most effective approach to merge them into a single VCF file for subsequent use in GWAS, with a primary focus on retaining multiallelic variants?
I attempted to use GLnexus with the .g.vcf files; however, the output did not include the multiallelic variants. I also tried normalizing the VCF files and then attempted to merge them using GLnexus, but encountered an error during the merge process.
These are GVCFs, which are different from VCFs. GVCFs contain the information that will ultimately reside in VCFs, but you need to process GVCFs further to get the VCFs.
Did you get the individual GVCFs with GATK? If so, you can use GenotypeGVCFs to get the "merged" VCF.
Yes, I created the individual GVCFs with GATK (haplotype calle). Will the GenotypeGVCFs (created by GATK) include the multiallelic variants in the merged VCF?
Yes, the final VCF should contain all variants you see in the GVCF unless they have a valid reason to be excluded.