How to convert VCF file into a binary matrix to find the genotype markers
2
0
Entering edit mode
7.1 years ago

Dear Friends,

I am trying to find genotype markers using 1000 Genomes VCF file for the continents Africa, Europe and East Asia. I am not sure how to convert the SNP data to a binary matrix.

Please let me know how to deal with it .

1000 Genomes genotype markers VCF • 3.4k views
ADD COMMENT
1
Entering edit mode
7.1 years ago

I'm assuming here that you're looking for Plink format. You'll need to download the 1KG vcf file for the populations you're interested in, and then run something like:

./vcftools --vcf ./1KG.vcf --plink-tped --out plinkformat
ADD COMMENT
0
Entering edit mode

Thank you so much for the help. I want to convert the diploid calls examples could be 0|1,1|0 and 1|2 to a matrix containing the states [(0,0),(0,1),(1,1)] respectively and the rows of the matrix should contain gene locus.

ADD REPLY
0
Entering edit mode
2.2 years ago

you can also get something close to this using SNPSift (http://pcingola.github.io/SnpEff/ss_extractfields/) and export your data as a TSV table with genotypes (you may nees to simplify the genotype to get rid of ./. 0/0 0|0 or similar cases. The too allows you to keep extra VCF info in the process and can be piped to filtering steps as well.

ADD COMMENT

Login before adding your answer.

Traffic: 2515 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6