Hi,
I have genotyped samples from bisulfite sequencing data. When I tried filtering sites and getting only biallelic SNPs, I got an error about the header, it was built wrong. there is an extra space or extra tab between ALT and QUAL.
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth">
##FORMAT=<ID=DPW,Number=1,Type=Integer,Description="Read Depth of Wastson Strand">
##FORMAT=<ID=DPC,Number=1,Type=Integer,Description="Read Depth of Crick Strand">
#CHROM POS ID REF ***ALT QUAL*** FILTER INFO FORMAT V02055.bsg
NW_022882922.1 4990 . G A 46 PASS NS=1,DP=12,DPW=12,DPC=0,AF=0.500 GT:GQ:DP:DPW:DPC 0/1:46:12:12:0
As you can see, this wrong header causes the file to be either incompatible or unreadable by other tools like bcftools For example:
bcftools view -m2 -M2 -v snps V02055.bsg.vcf.gz -o V02055.bsg.biallel.vcf.gz
[W::vcf_parse] Contig 'NW_022882922.1' is not defined in the header. (Quick workaround: index the file with tabix.)
[E::vcf_parse_format] Number of columns at NW_022882922.1:4990 does not match the number of samples (1 vs 2)
Error: VCF parse error
or even VCFtools - since the latest only uses VCFv4.2 and the output is VCFv4.4.
vcf-stats V02055.bsg.vcf.gz
The version "4.4" not supported, assuming VCFv4.2
Empty fields in the header line, the column 6 is empty, removing.
vcftools --minDP 5 --maxDP 150 --recode --recode-INFO-all --gzvcf V02055.bsg.vcf.gz --out V02055.bsg.biallel.vcf.gz
VCFtools - 0.1.16
(C) Adam Auton and Anthony Marcketta 2009
Parameters as interpreted:
--gzvcf V02055.bsg.vcf.gz
--recode-INFO-all
--maxDP 150
--minDP 5
--out V02055.bsg.biallel.vcf.gz
--recode
Using zlib version: 1.2.13
Error: VCF version must be v4.0, v4.1 or v4.2:
You are using version VCFv4.4
Is is possible to fix this error? I was suggested to use annotate but it seems to need a header, and I don't know how to fix this when the problem is that extra space/tab.
Might it be possible to convert from VCFv4.4 to VCFv4.2?
Thanks.