I have a gVCF of ~2.5k samples from a study W. then I have a VCF of ~8k joint called samples from 3 different studies: X, Y, Z.
I want to do a joint PCA analysis of common variants in all samples from study W, X, Y, Z without joint calling. To do this I found all variants in the 3 joint called samples with gnomAD AF > 5% and whose MAF across these 3 studies is > 10%.
I pulled out those variants from the gVCFs for each sample in study W and combined it into one by file.
Next I am going to combine the VCF of studies X, Y, Z and the gVCF of study W .
One possible hitch is that the gVCFs have <non_ref> in every ALT column (either alone or with "base, <non_ref>")
What considerations are there to when merging a gVCF with a VCF?