Entering edit mode
5.6 years ago
Qingyang Xiao
▴
160
Now I have 500 genes of interest that I want to download from gnomAD for SNP analysis.
It will take forever if I type the each gene name and click the button "Export to csv".
How can I do that in batches?
If you are interested in specific genes, you would probably want to use gnomAD exomes, not genomes. It's based on more samples and the file is substantially smaller.
Small suggestion: If you have the disk space (something in the order of ~1TB), you could output
wget
to a temporary (i.e.wget -O - "https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/genomes/gnomad.genomes.r2.1.1.sites.vcf.bgz" > gnomad.vcf.bgz
) and then query the file with gunzip + grep after in case you want to look at different genes, or you notice a typo etc. You could also do it per chromosomes and only grep the genes that match the chromosomes you need (see download page).Since you have 500 genes, you could also put them in a text file (one gene per row) and provide the file as your list of search strings by modiying the grep part here to do
gunzip -c gnomad.vcf.bgz | grep -E -f mygenes.txt
.Also keep in mind that grep with match whatever text is present; if you have gene symbols and some gene is a substring of something unrelated, it'll get matched, so you should definitely analyse your output for correct matches.
Finally, do you have gene symbols, or gene identifier (e.g. Ensembl, or RefSeq)? I would download the smallest file (chr21 sites VCF (6.12 GiB)) first and check that your inputs will work with what the gnomAD vcf provides, and then try on the whole dataset.
But, VCF files don't have gene names/symbols, correct? Maybe have to convert your gene name list into start:end coordinates. I have a similar task, and I'd love help on the matter.
Thanks. If I download .csv file directly from gnomAD, the data is integrated from both gnomAD Genomes and Exomes. But the code above for me only contains the data from only Genomes. Could I get the data integrated from Genomes and Exomes, just like I directly click to download?
I don't think there's a single file with both (officially at least) but the exome variants are at https://storage.googleapis.com/gnomad-public/release/2.1.1/vcf/exomes/gnomad.exomes.r2.1.1.sites.vcf.bgz (link from the gnomAD download page: https://gnomad.broadinstitute.org/downloads).