Dear Friends,
I am trying to find genotype markers using 1000 Genomes VCF file for the continents Africa, Europe and East Asia. I am not sure how to convert the SNP data to a binary matrix.
Please let me know how to deal with it .
Dear Friends,
I am trying to find genotype markers using 1000 Genomes VCF file for the continents Africa, Europe and East Asia. I am not sure how to convert the SNP data to a binary matrix.
Please let me know how to deal with it .
I'm assuming here that you're looking for Plink format. You'll need to download the 1KG vcf file for the populations you're interested in, and then run something like:
./vcftools --vcf ./1KG.vcf --plink-tped --out plinkformat
you can also get something close to this using SNPSift (http://pcingola.github.io/SnpEff/ss_extractfields/) and export your data as a TSV table with genotypes (you may nees to simplify the genotype to get rid of ./. 0/0 0|0 or similar cases. The too allows you to keep extra VCF info in the process and can be piped to filtering steps as well.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thank you so much for the help. I want to convert the diploid calls examples could be 0|1,1|0 and 1|2 to a matrix containing the states [(0,0),(0,1),(1,1)] respectively and the rows of the matrix should contain gene locus.