The design of BGEN means that when you subset samples the data has to
be recompressed - this is essentially what makes this slow. (By
contrast you can subset SNPs very quickly without recompression using
bgenix https://code.enkre.net/bgen.) It is therefore definitely worth
considering not subsetting samples but using a sample
inclusion/exclusion list instead, if your analysis software supports
that.
If you have to subset and want to use QCTOOL - some things to try are:
I typically use the options -bgen-compression zstd -bgen-bits 8 now,
c.f. https://doi.org/10.1101/308296 this is faster but first check
your downstream software supports zstd compression. Use a map/reduce
type pipeline (i.e. chunk data for re-encoding) - this can be
implemented using bgenix and cat-bgen.
I assume you’re working with the latest version?
Have you tried first stripping the SNPs, then stripping the samples?
Ever since working with the imputed data from the UKB I was never
successful in chopping pieces of bgen files using qctool in a timely
manner. If my memory serves me well, qctool manages to strip SNPs
fairly quickly, but is super slow to remove samples. Try PLINK 1.9 or
2.0.
As for “The computational facility I'm using should not limit the
speed of an operation.”, depending on how disk data moves in/out of
slave nodes and how busy the cluster is, I did saw processes to become
I/O starved on large clusters.
Also, consider that most tools that perform association testing can
utilize SNPs and samples lists. BOLT, SAIGE, regenie, SNPtest… so
there might not be a need to pre-filter. Put “NA” as phenotypes you’d
like to be “removed” from the testing.
In the end I did not pre-filter SNPs or samples - I set samples to NA within the phenotype file, and used a SNP inclusion list in the second stage of SAIGE with the flag idstoIncludeFile.
Hi! i am install qctool now, but i failed at compilation, did you meet the same issue? if so, how did you handle it? let me know, thanks!
Compiled binaries seem to be available in this directory: https://www.well.ox.ac.uk/~gav/resources/
Thank you so much!! I have visited the directory and tried to find out the one suit my system, but failed. however, i got version 2.0.7 last night, and finished compilation. Happy!