I have two already phased vcf files(one patient and one control), and would like to merge them together.
I've tried vcf-phased-joint, but it requires the same column, ie. the individual numbers should be equal, which sounds weird. Then I tried GATK -T CombineVariants, and it works!
But my questions are:
Is it OK to simply combine/merge two PHASED vcf together? (The optimal way in my mind is to combine patient and control bams and call vcf together, and phased all SNPs in vcf together using GATK-Readsbackedphasing; but it'll be too painful to process these bam files. Actually controls here are 1000 genome data). I mean after merging there'll be many genotype fields missing, is this OK for downstream plink analysis?
Actually how would plink handle missing genotype as well as unphased genotype?
Should I only use SNP for plink? Or it's OK to include indels as well?
Beginner for plink here, so confused Many many thanks!
See these posts: How can I merge a large amount of VCF files? and Combining data of multiple VCFs into one.