Hello:
I was trying to download whole genome data from 1000Genome phase 3 data and extract only the EUR population (GBR, TSI, FIN, IBS, CEU). I used the ftp site:
but apparently it is not the file I need, the error message says:
Error: No samples in .vcf file.
My question is where do I get the whole genome 1000Genome phase 3 data. Also, I checked Data slicer from EnsemblGRCh37, it allows population selection, but the maximum genome region to be extracted is 2.5Mb, so I can't get the whole genome data even if I succeed in downloading the whole genome dataset from the above ftp site (assume if it exists).
Opal
Hi Kevin,
I actually think the file
is not the right file, because it says 'no sample in .vcf file.
The naming of this file is also different from the other chromsome-specific files as listed below (sites.vcf.gz instead of genotypes.vcf.gz
Hi Kevin,
I have downloaded the vcf.gz files for all the chromosomes 1-22 (I don't need X and Y). But they are pretty big files, is there any way to concatenate them without unzipping? Also, would you be able to elaborate how to use the .ped file to extract EUR (GBR, CEU, TSI, IBS, FIN) only population?
Opal
Hey Opal. Yes, I would even recommend converting them to BCF (binary call format), which saves even more space. You can then again use BCFtools to concatenate them, e.g.,
bcftools concat
.