Entering edit mode
12 months ago
adarsh_munna
▴
50
Hi,
I have downloaded the gnomAD V4.0 Sites Hail Table from their website using gsutils.
gsutil -m cp -r gs://gcp-public-data--gnomad/release/4.0/ht/exomes/gnomad.exomes.v4.0.sites.ht/* .
Now in one of the directory (rows) the hail table was supposed to be there. However this directory contains metadata.json.gz and a subdirectory parts with contains many binary files:
part-0132-d93b0eee-7fa0-42eb-ae6b-953c31ec8e3b
part-0168-f86f8639-0536-46b4-969e-93b0ce16af55
..
..
Is there any way to combine all these to get a single hail table, or is there anything wrong in what I did?
Is there any way to get a single hail table?
Please let me know
Thank you
Do you have a Spark environment set up for hail usage or do you just need the data?
I need the data. So that it can be loaded and worked up on using the hail package of python
I'd recommend you go with the VCF downloads. I don't know what it takes to use the hail tables - from my limited understanding, the ht is going to be massive with all the part files and I've heard a colleague describe using distributed data storage like Hadoop/Spark to leverage the full power of hail, so I'm not sure you should do the hail tables if you don't have quite of bit of experience with hail or someone to guide you through it.