Hi all, new to bioinformatics and have what I think is a pretty basic question. I'm attempting to use hail to import a vcf file from the gnomad website (https://gnomad.broadinstitute.org/downloads). I've successfully created a python session in the terminal and loaded hail, now just trying to use import_vcf(), which should work by just taking one argument, the path to the file.
In [8]: chr1_test_file = hl.import_vcf('/gnomad-public/release/2.1/vcf/exomes/gnomad.exomes.r2.1.sites.chr1.vcf.bgz')
Here is my warning:
2019-09-20 15:24:25 Hail: WARN: `/gnomad-public/release/2.1/vcf/exomes/gnomad.exomes.r2.1.sites.chr1.vcf.bgz' refers to no files
I imagine my problem is just that I don't have the correct path, but I wasn't quite sure what that would be...
You should also know that the gnomAD team has files available already in Hail native format, from that same link. These will be much more usable, since the VCF format is much less flexible. It's much harder to get useful data out of the VEP consequence field in VCF form, for instance.
Also, if you're running this on Google Cloud Dataproc, the bucket identifier should start with
gs://
:gs://gnomad-public/...
Hi, I'm brand new to gnomad, vcf's, and cloud buckets. If I want to use the Hail files like you mentioned, I would have to use google storage and have an account with them, right? I am trying to download the files directly and use hail afterwards because I don't have a google storage subscription