Hi, I need guidance in creating a custom reference data base from Complete Genomics vcfBeta-GS0000*-ASM.vcf.bz2 files for the purpose as using as a reference panel for phasing and imputation.
I have closely followed the posting Custom Reference panel creation for data imputation from .vcf files up to the last step of merging, which is where I am encountering an error.
The command I'm entering prior to my error is:
bcftools merge -f PASS -Ov -m none -l temp.10.bcf.index.txt -o temp.10.merge.vcf
Specifically, the error that I am encountering after entering the above command is:
[W::bcf_hdr_check_sanity] GL should be declared as Number=G
The version of the vcf files from which the list of 10 bcf files (temp.10.bcf.index.txt) are derived from is 4.1 When I look at the GL lines that the error is referring to are as follows (ignore the backslash before the ID):
\##FORMAT=<\ID=GL,Number=.,Type=Integer,Description="Genotype Likelihood">
\##FORMAT=<\ID=CGA_CEGL,Number=.,Type=Integer,Description="Calibrated Genotype Likelihood, Equal Allele Fraction Assumption">
The VCFv4.1.pdf indicates that the GL (genotype likelihood) should be a number. Whereas, the Complete Genomics files that I have do not contain a number, rather, they contain a period ".".
I ran vcf-validator to see if that would yield any additional information, but it didn't, in fact, it just ran for about 10 minutes, after which, it simply returned to the prompt command.
Any help/ideas/comments/clarifications are welcome.