I'm trying to extract snps (list of RSID's and positions from gnomAD) from a series of .vcf.gz files for analysis, but im not entirely sure where to begin. The readme for the files state that the .vcf.gz files do not contain rsid's which makes this the first step I suspect I need to complete. Included with all said .vcf.gz files are.snpinfo files which I am only 50% certain contain relevant information. I am aware of vcftools annotation feature, but I need to first explore the datasets a bit. Is hail good for this? I am quite new to this space, so pardon the simplicity of my questions. Also: if this has been explained elsewhere please point me to the right spot or proper search terms, ive done a fair bit already but couldnt find much at this level.
Thank you.
Hi, JC! I'm working on VCF.gz files trying to extract variations associated with a certain genomic region. I was wondering if I can extract it by rows in a human readable format that can be filtered further (without actually unzipping the VCF.gz). Or just extracting the rows I want by genomic region and then filter it further by ID in the linux command line.