Entering edit mode
13 months ago
Barista
▴
10
Hi!
I would like to download VCF files for a certain population from the 1000 Genomes site. However, I would like to do this for around 100 people and each sample that I found in the data section, has .vcf.gz for each chromosome. My question is: is there a way to download them all at once, all files for all people? Can I maybe use an API for this somehow? I also have difficulties unzipping the .vcf.gz files.
I highly appreciate all help!
If your question is from where you can download all vcf files for all individuals (and populations) from the 1000G project - the last time I checked with the helpdesk of Ensembl, they pointed me to this ftp server link. I was also working on these files a while ago and I also remember thinking of using REST-API feature, but if recall correctly - I had errors when requesting for files in bulk (specific genomic intervals + specific population) and I decided it was best that I download the files and make local queries using tools like
vcftools
andbcftools
instead.On your question of having difficulties unzipping vcf.gz files - I am not sure if I would do that if I were you since most of the tools handling vcf files do take in .vcf.gz files too. For example, this is how you would read the vcf.gz files in
vcftools
:vcftools --gzvcf input_file.vcf.gz
. Have a look at the vcftools manual for more. For example, if I remember correctly, you can use--keep <filename>
to subset the vcf file to keep only selected individualsThank you a lot, I will try it! I also tried to use the REST-API before but with errors as well. I will try again with the vcftools and bcftools, thanks!