I have downloaded the 20101123 version RAW genotypes data encoded in VCF format. And I want to use plink to calculate the LD relation with my snp list.
The vcftools offered us a method to convert the vcf genotypes to plink ped format while not provide a method to extract one population data.
The VCFtoped perl script offered by 1000G can not extract all the chr data just within a defined region, and besides the info file is something different with the .map file of plink, missing chr column.
So is there any existing method to extract all genotypes in VCF format of CEU population?
If you know such a method, could you tell me how?
Thank you!
Best for all!
That is great, it works. And another question is that the 1000G pilot1 offered us a genotypes encoded in VCF3.3 version, while the vcftools requires a version higher than 4, so how can i convert the version of vcf files.
You must be using an older version of VCFtools. The later versions work with VCF versions 4 and higher.
I am using the latest version of vcstools, which can handle the v.4 vcf files. What I am saying is that the vcf file is coded in v.3.3 format that the tools can not process with it. Error:VCF version must be v4.0 or v4.1: You are using version VCFv3.3
My mistake, I miss the function in vcftools that is vcf-convert
Wonderful solution! Still works after all this time. Thanks