Entering edit mode
20 months ago
LChart
4.7k
I am analyzing a large cohort (>500K) of case/control data and am attempting to perform variant set associations ("burden tests"). My first attempt was using rvtest
, and while it works for smaller cohorts; for the full cohort it fails due to std::bad_alloc()
even on a >1T mem machine. Before I start testing alternatives - has anyone on this forum used, or know of a use of, burden testing software besides REGENIE that can be applied to cohorts of this size.
This is targeted sequencing, so REGENIE cannot be applied.
Thanks
people working on ukbiobank are using rvtests: https://www.medrxiv.org/content/10.1101/2021.11.04.21265866v1.full.pdf
I'm not sure that's 100% accurate; and at any rate it's not likely to be helpful in my case due to the following:
#1: rvtests is explicitly not working for my use-case.
#2: There is an issue open in rvtests demonstrating it does not work for UKBB exomes: https://github.com/zhanxw/rvtests/issues/145
#3: The actual "N" used in your linked paper is in Table 2 on pp 46, and it is <200,000
#4: rvtests wasn't even used for the full association testing in your linked paper; but only to generate per-subcohort scores. rareMETALS was used for the association: "Score and covariance files used as input for gene-based meta-analyses in rareMETALS were generated using Rvtests as described above" (pp 24), indicating that when rvtests was run, the "N" was even lower than the values listed on pp 46
tl; dr I don't think that rvtests properly scales; and has not been directly applied (i.e., in a one-shot fashion) UKBB or UKBB-size cohorts. Please let me know if you're familiar with software that is actually capable of this.