Assuming you want dosage information in your VCF, you need to replace "--export vcf" with something like "--export vcf vcf-dosage=DS". You may also want to add the 'bgz' modifier to request bgzipping of the VCF file.
Quick note about converting UK Biobank BGEN to VCF - I first tried this using QCTOOL and after 15 days only ~1.1 million lines from each chromosome had written. At that pace it would take ~15 weeks to convert chromosome 1 to VCF.gz using QCTOOL.
I then saw this post and installed a plink2 module on my HPC and was able to convert chr21 UKBB BGEN to VCF (not bgz) in 98 minutes! This means that plink2 seems to convert BGEN to VCF many, many times faster than QCTOOL.
As Chris points out, just the chr21 (the smallest autosome) VCF file from UKBB was 2.4TB, so you should definitely consider using the bgz modifier to reduce file size.
Not sure if you will read this but I used your code to convert my data from bgen to vcf, unfortunately I get this error:
(imlabtools) [s1997351@node2f24(eddie) Bgen_vcf_test]$ plink2 --bgen crcsurvival_chr1.bgen --sample crcsurvival_chr1.sample --export vcf vcf-dosage=DS
PLINK v2.00a2.3LM 64-bit Intel (24 Jan 2020) www.cog-genomics.org/plink/2.0/
(C) 2005-2020 Shaun Purcell, Christopher Chang GNU General Public License v3
Logging to plink2.log.
Options in effect:
--bgen crcsurvival_chr1.bgen
--export vcf vcf-dosage=DS
--sample crcsurvival_chr1.sample
Start time: Wed Apr 6 08:34:05 2022
Warning: No --bgen REF/ALT mode specified ('ref-first', 'ref-last', or
'ref-unknown'). This will be required as of alpha 3.
385228 MiB RAM detected; reserving 192614 MiB for main workspace.
Allocated 14461 MiB successfully, after larger attempt(s) failed.
Using up to 32 threads (change this with --threads).
--bgen: 640921 variants detected, format v1.2.
Error: Invalid categorical phenotype '0' on line 3, column 5 of .sample file
(positive integer < 2^31 or --missing-code value expected).
End time: Wed Apr 6 08:34:05 2022
Do you, or anyone else, have any idea what I need to do differently?
Thanks a lot!
ADD REPLY
• link
updated 19 months ago by
GenoMax
147k
•
written 2.6 years ago by
Bine
▴
90
0
Entering edit mode
This is a newer type of .sample file; you need to update to a plink2 build from August 2020 or later to import it.
you need to add REF/ALT mode in your command to tell plink2 which allele is your REF, usually the first one, so you add 'ref-first' in your command line 'plink2 --bgen ukb_imp_chr21_v3.bgen ref-first --sample ukb_imp_chr21_v3_s.sample --export vcf vcf-dosage=DS', or just choose 'ref-unknow' to let plink2 to find.
Hi Richard,
Did you figure out what was the problem? I am having the same issue.
Thanks!
Yes I used the comment below by chrchang253 and it worked..
Further question for Chris here: for the UK Biobank BGEN data, what's the proper REF/ALT mode?
Warning: No --bgen REF/ALT mode specified ('ref-first', 'ref-last', or 'ref-unknown'). This will be required as of alpha 3.
The alpha 3 error message explicitly notes that UK Biobank BGENs use 'ref-first' encoding.