Question

bcftools merge fails: Only fixed-length vectors are supported

0

Entering edit mode

4.6 years ago

German.M.Demidov ★ 2.9k

Dear community members,

I face a problem - I need to create a multi-sample VCF from thousands of VCF files, the problem is - they are created with FreeBayes and somehow the techniques I always use do not work. E.g. when I try to use

bcftools merge

it tells me:

Only fixed-length vectors are supported with -i sum:DP

I am not that proficient with VCF format and this error message is totally cryptic for me - do you have an idea why it may happen? An example line from my VCF files looks like:

chr1    911595  .       A       G       2911    .       MQM=60  GT:DP:AO        1/1:93:93

I had an idea that it may be caused by multi-allelic sites - but bcftools in theory should be able to deal with them...

Any advice on how to create a multi-sample VCF is appreciated! (I used bcftools merge several times with GATK output and it worked, but now I am stuck...)

Googling did not help.

Command line used:

/mnt/share/opt/bcftools-1.9/bcftools merge sample1.vcf.gz  sample2.vcf.gz --merge none > merged.cases.vcf

bcftools mutli-sample VCF • 1.4k views

ADD COMMENT • link updated 4.6 years ago by Carambakaracho ★ 3.3k • written 4.6 years ago by German.M.Demidov ★ 2.9k

score 2 · Accepted Answer · 2020-04-09

2

Entering edit mode

4.6 years ago

Carambakaracho ★ 3.3k

As you don't have DP values in the info column, switch of default behaviour to sum up the DP values in the infocolumn

bcftools merge -i -

This is untested but that's how I understand the help:

bcftools merge --help
   -i, --info-rules <tag:method,..>   rules for merging INFO fields (method is one of sum,avg,min,max,join) or "-" to turn off the default [DP:sum,DP4:sum]

ADD COMMENT • link 4.6 years ago by Carambakaracho ★ 3.3k

0

Entering edit mode

Thanks I will try! I honestly checked the manual but somehow my logic was not efficient enough to find this...

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

well, now it complains about the header:

Could not parse the header line: "##SAMPLE=<ID>,Gender=F,IsTumor=No...etc etc"

but this is another question - for that one the answer worked!

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Looks like it has dash characters and bcftools does not like them. Will try vcftools vcf-merge instead - it just does not worth it to re-write all the VCFs because of this...

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

0

Entering edit mode

Sorry,

just had to add something to this info if smb else will face the same problem

bcftools complains, but does the job - wow, I am impressed, so I have a multi-sample VCF despite multiple error messages

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

well, in case the Sample line is in the vcf just like you pasted above, the line is invalid. It should be something like

#SAMPLE=<ID=Patient_XYZ,Gender=F,IsTumor=No>

See vcf specs (v.4.3) section 1.4.8 Sample Field Format. I'd recommend to take a closer look at the merged VCF, just to make sure, you'll be able to trace the individual samples back after merging.

ADD REPLY • link 4.6 years ago by Carambakaracho ★ 3.3k

0

Entering edit mode

Thanks a lot! Will do! Somehow we still follow 4.2 - but I guess the difference is not big. We use our own processing system (we don't even really use VCFs) which is fine for clinics, but for research it is such a pain...

ADD REPLY • link 4.6 years ago by German.M.Demidov ★ 2.9k

1

Entering edit mode

no, the differences between 4.2 and 4.3 are mostly semantics in the specs, they're way more explicit.

We use our own processing system (we don't even really use VCFs) which is fine for clinics, but for research it is such a pain...

Oh, I know that feeling... :-D

ADD REPLY • link 4.6 years ago by Carambakaracho ★ 3.3k