Cannot process a custom VCF file using bcftools
1
1
Entering edit mode
8 weeks ago
Qiong ▴ 10

I created a VCF file as below which bcftools cannot parse the header. Can anyone tell me what is wrong with my VCF file? Thanks!

bcftools query -f '%CHROM %POS [%DS1]\n' haha.vcf >haha.DS1.out
[E::bcf_hdr_parse_sample_line] Could not parse the "#CHROM.." line, either the fields are incorrect or spaces are present instead of tabs:
        #CHROM  POS     FORMAT  A-BCT-WA403-CT  A-BCT-DF87-WA

Failed to read from haha.vcf: could not parse header

Below is haha.vcf

##fileformat=VCFv4.2
##filedate=20240926
##FORMAT=<ID=DS1,Number=1,Type=Float,Description="EUR dosage">
##FORMAT=<ID=DS2,Number=1,Type=Float,Description="EAS dosage">
##FORMAT=<ID=DS3,Number=1,Type=Float,Description="AFR dosage">
##FORMAT=<ID=DS4,Number=1,Type=Float,Description="SAS dosage">
##FORMAT=<ID=DS5,Number=1,Type=Float,Description="AMR dosage">
##ANCESTRY=<EUR=0,EAS=1,AFR=2,SAS=3,AMR=4>
#CHROM  POS FORMAT  A-BCT-WA403-CT  A-BCT-DF87-WA
chr22   10550966    DS1:DS2:DS3:DS4:DS5 0.95:0:0:0:0.05 0.99:0:0:0:0.01
chr22   10586957    DS1:DS2:DS3:DS4:DS5 0.90:0:0:0:0.01 0.93:0:0:0:0.07
chr22   10550966    DS1:DS2:DS3:DS4:DS5 0.97:0:0:0:0.03 0.99:0:0:0:0.01

Following is the output of R readLine("haha.vcf"). Tabs were used to separate fields.

a <-readLines("haha.vcf")
a

[1] "##fileformat=VCFv4.2"                                                  
 [2] "##filedate=20240926"                                                   
 [3] "##FORMAT=<ID=DS1,Number=1,Type=Float,Description=\"EUR dosage\">"      
 [4] "##FORMAT=<ID=DS2,Number=1,Type=Float,Description=\"EAS dosage\">"      
 [5] "##FORMAT=<ID=DS3,Number=1,Type=Float,Description=\"AFR dosage\">"      
 [6] "##FORMAT=<ID=DS4,Number=1,Type=Float,Description=\"SAS dosage\">"      
 [7] "##FORMAT=<ID=DS5,Number=1,Type=Float,Description=\"AMR dosage\">"      
 [8] "##ANCESTRY=<EUR=0,EAS=1,AFR=2,SAS=3,AMR=4>"                            
 [9] "#CHROM\tPOS\tFORMAT\tA-BCT-WA403-CT\tA-BCT-DF87-WA"                    
[10] "chr22\t10550966\tDS1:DS2:DS3:DS4:DS5\t0.95:0:0:0:0.05\t0.99:0:0:0:0.01"
[11] "chr22\t10586957\tDS1:DS2:DS3:DS4:DS5\t0.90:0:0:0:0.01\t0.93:0:0:0:0.07"
[12] "chr22\t10550966\tDS1:DS2:DS3:DS4:DS5\t0.97:0:0:0:0.03\t0.99:0:0:0:0.01"
R bcftools VCF • 560 views
ADD COMMENT
1
Entering edit mode
8 weeks ago
mazegriff ▴ 100

Hi Qiong,

It appears there are 5 tab-delimited fields for describing the variants when there should be 7. Here is the specifications for VCF files by the Broad Institute's GATK framework:

These first 7 fields are required by the VCF format and must be present, although they can be empty (in practice, there has to be a dot, ie . to serve as a placeholder).

The fields are here:

CHROM POS ID REF ALT QUAL FILTER

If the VCF file follows the specifications provided by GATK, there should be no parsing errors with tools like bcftools.

Hope this is helpful.

Maze

ADD COMMENT
0
Entering edit mode

Thanks. It worked after I added these additional columns!

ADD REPLY
1
Entering edit mode

Don't forget to follow up on your threads. If an answer was helpful, you should upvote it; if the answer resolved your question, you should mark it as accepted. You can accept more than one answer if they all work. If an answer was not really helpful or did not work, provide detailed feedback so others know not to use that answer.

Upvote|Bookmark|Accept

ADD REPLY
0
Entering edit mode

Nice. Thank you for letting me know.

ADD REPLY

Login before adding your answer.

Traffic: 1648 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6