How to extract SNPs from vcf file based on Population
1
1
Entering edit mode
6.9 years ago

Dear Friends,

My vcf file has SNPs available for different population(Africa, America, Europe,East Asia and South Asia ). I want to extract the data for Europe and East Asia together . Kindly let me know the possible ways.

Thanks in Advance

1000Genomes Linux VCF • 5.2k views
ADD COMMENT
5
Entering edit mode
6.9 years ago
NB ▴ 960

You can do this easily using vcftools, GATK tools, plinkseq etc.

you first have to generate a text file with the list of samples that form the population of your choice, let's say "population_of_interest.txt" Then,

vcf-subset -e -c population_of_interest.txt input.vcf > output.vcf

or

vcftools --vcf input.vcf --keep population_of_interest.txt  --recode > output.vcf
ADD COMMENT
0
Entering edit mode

Thanks a ton Nandini ... it works :)

ADD REPLY
0
Entering edit mode

This code works fine when i run for one chromosome at a time. But, I want to extract SNPs for all chromosomes together ,please let me know if ithere is any other option ?

ADD REPLY
0
Entering edit mode

It should work for all chromosomes. Does your vcf input file have all chromosomes ?

ADD REPLY
0
Entering edit mode

@Nandini .. I have VCF file for each chromosome seperately

ADD REPLY

Login before adding your answer.

Traffic: 2572 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6