method for querying gnomAD with long list of "uncharacterized" variants
0
0
Entering edit mode
21 months ago

Hi - I have been doing some analysis of published cancer sequencing data, and I have generated lists of somatic (cancer-specific) variants, filtered based on various criteria, for which I would like to query gnomAD. (Essentially to ask, how many of them have never/almost never been seen in germline sequencing of healthy individuals, either due to strong functional effects that have been selected against, or because they arise via mutagenic processes that are not operating in the germline?) I put "uncharacterized" in scare quotes because of course many of these variants have been characterized, but not by me and not in terms of their frequency in the general population.

I can format my lists in whatever format is appropriate, e.g. as VCF or MAF files. Right now I'm doing the analysis in R, and the tables are represented as GRanges objects - so I have the genome position, control allele, mutant allele, etc., but no SNP IDs. I'd like to be able to input that file and get back a report of the gnomAD allele frequency (and potentially other annotations) for each one. Is there a simple way to do this? (Bonus points if it can be done within R, but I'm fine with command-line approaches if necessary.)

R gnomAD variant GenomicRanges • 1.0k views
ADD COMMENT
1
Entering edit mode

Read the gnomAD VCF file in R and do a lookup? Heavy-handed but will work.

ADD REPLY
0
Entering edit mode

Thanks - this is probably what I will have to do, but my understanding is that the gnomAD VCF file is really enormous (almost 500 GB, assuming I’m looking at the right thing), so this will present some major practical difficulties.

ADD REPLY
0
Entering edit mode

Save just the CHR, POS, REF, ALT, ID and required frequency fields in a tab-delimited file, then use data.table::fread to read it in. Do it on a cluster node with ~64GB RAM if possible. If not, switch to command line (bcftools/vt/vep) but in any case you'll need a bit of compute power to get this done.

ADD REPLY

Login before adding your answer.

Traffic: 1352 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6