Correct VCF header
1
0
Entering edit mode
3 months ago
ja569116 • 0

Hi,

I have genotyped samples from bisulfite sequencing data. When I tried filtering sites and getting only biallelic SNPs, I got an error about the header, it was built wrong. there is an extra space or extra tab between ALT and QUAL.

##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=DPW,Number=1,Type=Integer,Description="Read Depth of Wastson Strand">
##FORMAT=<ID=DPC,Number=1,Type=Integer,Description="Read Depth of Crick Strand">
#CHROM  POS ID  REF ***ALT      QUAL*** FILTER  INFO    FORMAT  V02055.bsg
NW_022882922.1  4990    .   G   A   46  PASS    NS=1,DP=12,DPW=12,DPC=0,AF=0.500    GT:GQ:DP:DPW:DPC    0/1:46:12:12:0

As you can see, this wrong header causes the file to be either incompatible or unreadable by other tools like bcftools For example:

bcftools view -m2 -M2 -v snps V02055.bsg.vcf.gz -o V02055.bsg.biallel.vcf.gz

[W::vcf_parse] Contig 'NW_022882922.1' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse_format] Number of columns at NW_022882922.1:4990 does not match the number of samples (1 vs 2)
Error: VCF parse error

or even VCFtools - since the latest only uses VCFv4.2 and the output is VCFv4.4.

vcf-stats V02055.bsg.vcf.gz 

The version "4.4" not supported, assuming VCFv4.2
Empty fields in the header line, the column 6 is empty, removing.

vcftools --minDP 5 --maxDP 150 --recode --recode-INFO-all --gzvcf V02055.bsg.vcf.gz --out V02055.bsg.biallel.vcf.gz

VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009

Parameters as interpreted:
    --gzvcf V02055.bsg.vcf.gz
    --recode-INFO-all
    --maxDP 150
    --minDP 5
    --out V02055.bsg.biallel.vcf.gz
    --recode

Using zlib version: 1.2.13
Error: VCF version must be v4.0, v4.1 or v4.2:
You are using version VCFv4.4

Is is possible to fix this error? I was suggested to use annotate but it seems to need a header, and I don't know how to fix this when the problem is that extra space/tab.

Might it be possible to convert from VCFv4.4 to VCFv4.2?

Thanks.

VCF • 379 views
ADD COMMENT
0
Entering edit mode
3 months ago
tomas4482 ▴ 430

You can check the specification of different vcfs here: SAM/BAM and related specifications

Your vcf is somehow corrupted. The header is incomplete. You need to provide sufficient headers to define your contigs and other necessary information. If you are using RefSeq database for annotation, you need to define the contigs from their annotation gtf/gff files. In ensembl, it is 1,2,3,4,5.... In gencode, it is chr1, chr2, chr3....

I don't know if ensembl-VEP will automatically add the headers for you. You could have a try by passing the vcf to VEP and see what's going on.

As an example, I used to manually modify/create vcfs and pass it to vcftools etc. Here is the least content of vcf header I used.

##fileformat=VCFv4.2
##FILTER=<ID=PASS,Description="All filters passed">
##INFO=<ID=ReverseComplementedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been reverse complemented in liftover since the mapping from the previous reference to the current one was on the negative strand.">
##INFO=<ID=SwappedAlleles,Number=0,Type=Flag,Description="The REF and the ALT alleles have been swapped in liftover due to changes in the reference. It is possible that not all INFO annotations reflect this swap, and in the genotypes, only the GT, PL, and AD fields have been modified. You should check the TAGS_TO_REVERSE parameter that was used during the LiftOver to be sure.">
##contig=<ID=1,length=248956422>
##contig=<ID=10,length=133797422>
##contig=<ID=11,length=135086622>
##contig=<ID=12,length=133275309>
##contig=<ID=13,length=114364328>
##contig=<ID=14,length=107043718>
##contig=<ID=15,length=101991189>
##contig=<ID=16,length=90338345>
##contig=<ID=17,length=83257441>
##contig=<ID=18,length=80373285>
##contig=<ID=19,length=58617616>
##contig=<ID=2,length=242193529>
##contig=<ID=20,length=64444167>
##contig=<ID=21,length=46709983>
##contig=<ID=22,length=50818468>
##contig=<ID=3,length=198295559>
##contig=<ID=4,length=190214555>
##contig=<ID=5,length=181538259>
##contig=<ID=6,length=170805979>
##contig=<ID=7,length=159345973>
##contig=<ID=8,length=145138636>
##contig=<ID=9,length=138394717>
##contig=<ID=MT,length=16569>
##contig=<ID=X,length=156040895>
##contig=<ID=Y,length=57227415>
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO
ADD COMMENT

Login before adding your answer.

Traffic: 1876 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6