How To Extract A Specific Population Vcf File From 1000G Genotypes Vcf File
2
3
Entering edit mode
12.1 years ago
J.F.Jiang ▴ 930

I have downloaded the 20101123 version RAW genotypes data encoded in VCF format. And I want to use plink to calculate the LD relation with my snp list.

The vcftools offered us a method to convert the vcf genotypes to plink ped format while not provide a method to extract one population data.

The VCFtoped perl script offered by 1000G can not extract all the chr data just within a defined region, and besides the info file is something different with the .map file of plink, missing chr column.

So is there any existing method to extract all genotypes in VCF format of CEU population?

If you know such a method, could you tell me how?

Thank you!

Best for all!

vcf genotyping • 10.0k views
ADD COMMENT
5
Entering edit mode
12.1 years ago
Adam ★ 1.0k

Create a file listing all CEU individuals in the 1000G, and then use:

./vcftools --vcf <vcf_file> --keep CEUlist.txt --out outputfile_prefix --plink

Should do what you want.

ADD COMMENT
0
Entering edit mode

That is great, it works. And another question is that the 1000G pilot1 offered us a genotypes encoded in VCF3.3 version, while the vcftools requires a version higher than 4, so how can i convert the version of vcf files.

ADD REPLY
0
Entering edit mode

You must be using an older version of VCFtools. The later versions work with VCF versions 4 and higher.

ADD REPLY
0
Entering edit mode

I am using the latest version of vcstools, which can handle the v.4 vcf files. What I am saying is that the vcf file is coded in v.3.3 format that the tools can not process with it. Error:VCF version must be v4.0 or v4.1: You are using version VCFv3.3

ADD REPLY
0
Entering edit mode

My mistake, I miss the function in vcftools that is vcf-convert

ADD REPLY
0
Entering edit mode

Wonderful solution! Still works after all this time. Thanks

ADD REPLY
0
Entering edit mode
12.1 years ago
J.F.Jiang ▴ 930

The present stupid method I can use is that using the vcf-subset encoded in vcftools, and extract all symbols of CEU in reference panel, and then use vcf-subset -c LABLE xxxxgenotypes.vcf.gz > xxxx.genotype.ceu.vcf.gz

It is still in processing, and do not know whether the command is right or not. And this method is not clever enough for a bioinformatical person.

So if you know any better solution, please tell me.

ADD COMMENT

Login before adding your answer.

Traffic: 2640 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6