Split 1000 genome f VCF by subpopulation
1
1
Entering edit mode
4.4 years ago
ThePlaintiff ▴ 90

How do I split 1000 genome VCF files by sub-populations while retaining variants that are only present in the sub-population? For example, if I have 1000 genome chromosome 10 file as chr10.vcf, I'd like to get from it: chr10_LWK.vcf (LWK subpopulation), chr10_YRI.vcf (YRI subpopulation) e.t.c. I then would like to find snps that are present in LWK but absent in YRI using bcftools isec or contrast.

Thanks

SNP genome next-gen • 1.1k views
ADD COMMENT
0
Entering edit mode
4.4 years ago

In step 2 of this tutorial, you can obtain a PED file that contains the IID-to-population mappings: Produce PCA bi-plot for 1000 Genomes Phase III - Version 2 (IID = Individual ID). You can then create lists for each population and filter using BCFtools. For then comparing variants, my preference would be indexed AWK arrays, but, of course, feel free to use whatever you feel appropriate.

Kevin

ADD COMMENT
1
Entering edit mode

Thank you Kevin. Your tutorial provided many insights.

ADD REPLY

Login before adding your answer.

Traffic: 2551 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6