I have two vcf files from different project. One has 200 animals, the latter has 4000 animals. Before merging them into one file, I just realized the both vcf files have some missing genotypes. It is shown as ./.
bcftools merge has an option to set the missing genotypes as 0/0
. I am not completely sure if I should use this option or not. In next step I will add annotation and split the merged file to the separate vcf files for each populations. Last I will look at the allele frequency of populations. Does 0/0 affect the AF calculation?
Thanks for the suggestions in advance!
Thanks for the response!
I used this to merge my files by setting the missing genotypes as 0/0
bcftools merge -m none -0 file1.vcf.gz file2.vcf.gz > mergedfile.vcf
Both vcf files also contain unshared variants. I guess it will also create missing genotypes in the merged file. In that case, what should I use?
If they contain unshared variants between the two, then don't set the missing genotypes to "0/0". Imagine if you have a SNP genotyped in one set and not the other - when you merge them together, then it will automatically fill in the missing genotypes as "0/0" - but that's just guessing and half of them might be wrong! Best just to leave them as missing if you are calculating allele frequencies.