I'm working on a project that involves merging four VCF files. The first three datasets ran smoothly through vcftools' vcf-merge, but merging the final dataset produces the following error:
Uh: 4 vs 1
at Vcf.pm line 169
Vcf::throw('Vcf4_0=HASH(0xa2d9870)', 'Uh: 4 vs 1\x{a}') called at Vcf.pm line 1492
VcfReader::format_haplotype('Vcf4_0=HASH(0xa2d9870)', 'ARRAY(0xa3e65c0)', 'ARRAY(0xa412cb0)') called at /path/to/vcftools_0.1.6/perl/vcf-merge line 426
main::merge_vcf_files('HASH(0x9f54170)') called at /path/to/vcftools_0.1.6/perl/vcf-merge line 12
All files have been compressed with bgzip, indexed with tabix, and should be in proper VCF4 format.
Where are your VCFs coming from? can you post a few lines of your VCF? My guess is that your VCFs are wonky, no fault of yours. I can probably hack VCF.pm to fix your problem.
The first three VCF files (the ones that successfully merged together) are from the uk10k project; the last one is from 1,000 genomes.
A few lines from the uk10k datasets (only the first few patients are shown):
A few lines from the 1,000 geones dataset:
So after looking at the VCF tools subroutine _format_line_hash I cannot tell you what is wrong. The code spans several pages and is not documented. Grrr - bad coding.
I suggest you ask this question on the vcftools help mailing list, as it is more likely to be seen by the developer of the perl module (vcftools-help (at) lists.sourceforge.net).