bcftools view VCF parse error
Entering edit mode
4.4 years ago
gnomee ▴ 50

Hi all,

I am experiencing the following issue when I try to view my VCF file with bcftools:

[W::bcf_hdr_check_sanity] GL should be declared as Number=G
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=IS,Number=2,Type=Float,Description="Maximum number of reads supporting an indel and fraction of indel reads">
##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes for each ALT allele, in the same order as listed">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=QBD,Number=1,Type=Float,Description="Quality by Depth: QUAL/#reads">
##INFO=<ID=RPB,Number=1,Type=Float,Description="Read Position Bias">
##INFO=<ID=MDV,Number=1,Type=Integer,Description="Maximum number of high-quality nonRef reads in samples">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias (v2) for filtering splice-site artefacts in RNA-seq data. Note: this version may be broken.">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=DV,Number=1,Type=Integer,Description="# high-quality non-reference bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
##bcftools_viewCommand=view myvcf.vcf.gz; Date=Wed Jul  1 08:30:34 2020
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  
merged.mdup.bam   SEQUENCE_CONTEXT        INFO_control(VAF=variant_allele_fraction;TSR=total_variant_supporting_reads_incl_lowqual)       ANNOTATION_control    DBSNP   1K_GENOMES      ANNOVAR_FUNCTION        GENE    EXONIC_CLASSIFICATION   ANNOVAR_TRANSCRIPTS     SEGDUP  CYTOBAND        REPEAT_MASKER   DAC_BLACKLIST   DUKE_EXCLUDED   HISEQDEPTH    SELFCHAIN       MAPABILITY      SIMPLE_TANDEMREPEATS    CONFIDENCE      RECLASSIFICATION        PENALTIES       seqBiasPresent_1        seqingBiasPresent_1     seqBiasPresent_2     seqingBiasPresent_2
[W::vcf_parse] FILTER 'RE' is not defined in the header
[W::vcf_parse] FILTER 'TAC' is not defined in the header
[W::vcf_parse] FILTER 'HSDEPTH' is not defined in the header
[E::vcf_parse_format] Invalid character 'A' in 'GT' FORMAT field at 1:10109
Error: VCF parse error

My command was:

bcftools view myvcf.vcf.gz

Can anybody tell me why this is the case? As far as I understand, the GT format field should contain the genotype as a string, should 'A' thus not be a valid characte for that field?

Thanks in advance.

vcf bcftools • 4.5k views
Entering edit mode
4.4 years ago

Can anybody tell me why this is the case?

[W::vcf_parse] FILTER 'RE' is not defined in the header

a line ##FILTER=<ID=RE,... is missing in the header

[W::vcf_parse] FILTER 'HSDEPTH' is not defined in the header

a line ##FILTER=<ID=HSDEPTH,... is missing in the header

[W::vcf_parse] FILTER 'TAC' is not defined in the header

a line ##FILTER=<ID=TAC,... is missing in the header

[E::vcf_parse_format] Invalid character 'A' in 'GT' FORMAT field at 1:10109

the VCF spec says a genotype should be declared with the index ofthe alleles, not the allele string.


GT (String): Genotype, encoded as allele values separated by either of/or|. The allele values are 0 for thereference allele (what is in the REF field), 1 for the first allele listed in ALT, 2 for the second allele list inALT and so on.

Entering edit mode

Ah that makes perfect sense! Thank you!


Login before adding your answer.

Traffic: 2394 users visited in the last hour
Help About
Access RSS

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6