Entering edit mode
3.2 years ago
kpatil
▴
50
Hi all,
I have a vcf file that was extracted from UKB data using qctool (v2.0.6-Ubuntu16.04-x86_64) and contains data in the GP format. This contains a bunch of SNPs from a single chromosome.
❱ wc -l chromosome1.vcf
260 chromosome1.vcf
Then I try to convert this file to .bgen
again using qctool but get "0 snps" and the output file is empty.
What am I doing wrong?
qctool -g chromosome1.vcf -vcf-genotype-field GP -og chromosome1.bgen -ofiletype bgen_v1.2 -bgen-bits 8
> Welcome to qctool (version: 2.0.6, revision 18b8f17)
>
> (C) 2009-2017 University of Oxford
>
> Opening genotype files :
> [******************************] (1/1,1.3s,0.8/s)
> ========================================================================
>
> Input SAMPLE file(s): Output SAMPLE file: "(n/a)".
> Sample exclusion output file: "(n/a)".
>
> Input GEN file(s):
> (not computed) "chromosome1.vcf"
> (total 1 sources, number of snps not computed).
> Number of samples: 487409 Output GEN file(s): "chromosome1.bgen" Output SNP position file(s): (n/a) Sample
> filter: .
> # of samples in input files: 487409.
> # of samples after filtering: 487409 (0 filtered out).
>
> ========================================================================
>
> filetype hint = "bgen_v1.2", guess is "bgen_v1.2". Processing SNPs
> : (0/?,0.3s,0.0/s)
> ========================================================================
>
> Number of SNPs:
> -- in input file(s): (not computed). -- in output file(s): 0
>
> Number of samples in input file(s): 487409.
>
> Output GEN files: (0 snps)
> "chromosome1.bgen"
> (total 0 snps).
> ========================================================================
>
> !! Error (genfile::MalformedInputError): Source "chromosome1.vcf" is malformed on line 4..
The error that the vcf file is malformed on line 4
is quite unexpected for me as this file was also created by qctool and was not touched.
Here is a snippet from the vcf file with the malformed line at the end which is the first line with SNP data:
❱ head -n4 chromosome1.vcf | cut -f1-15
##fileformat=VCFv4.2
##FORMAT=<ID=GP,Type=Float,Number=G,Description="Genotype call probabilities">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT 4303212 3351913 2982758 3576795 4579377 5488446
01 2077409 rs385039,1:2077409_A_G A G . . . GP 1,0,0 1,0,0 0,0,1 1,0,0 1,0,0 0,1,0