Dear community members,
I face a problem - I need to create a multi-sample VCF from thousands of VCF files, the problem is - they are created with FreeBayes and somehow the techniques I always use do not work. E.g. when I try to use
bcftools merge
it tells me:
Only fixed-length vectors are supported with -i sum:DP
I am not that proficient with VCF format and this error message is totally cryptic for me - do you have an idea why it may happen? An example line from my VCF files looks like:
chr1 911595 . A G 2911 . MQM=60 GT:DP:AO 1/1:93:93
I had an idea that it may be caused by multi-allelic sites - but bcftools in theory should be able to deal with them...
Any advice on how to create a multi-sample VCF is appreciated! (I used bcftools merge several times with GATK output and it worked, but now I am stuck...)
Googling did not help.
Command line used:
/mnt/share/opt/bcftools-1.9/bcftools merge sample1.vcf.gz sample2.vcf.gz --merge none > merged.cases.vcf
Thanks I will try! I honestly checked the manual but somehow my logic was not efficient enough to find this...
well, now it complains about the header:
but this is another question - for that one the answer worked!
Looks like it has dash characters and bcftools does not like them. Will try vcftools vcf-merge instead - it just does not worth it to re-write all the VCFs because of this...
Sorry,
just had to add something to this info if smb else will face the same problem
bcftools complains, but does the job - wow, I am impressed, so I have a multi-sample VCF despite multiple error messages
well, in case the Sample line is in the vcf just like you pasted above, the line is invalid. It should be something like
See vcf specs (v.4.3) section 1.4.8 Sample Field Format. I'd recommend to take a closer look at the merged VCF, just to make sure, you'll be able to trace the individual samples back after merging.
Thanks a lot! Will do! Somehow we still follow 4.2 - but I guess the difference is not big. We use our own processing system (we don't even really use VCFs) which is fine for clinics, but for research it is such a pain...
no, the differences between 4.2 and 4.3 are mostly semantics in the specs, they're way more explicit.
Oh, I know that feeling... :-D