012 genotype matrix using vcf tools
1
0
Entering edit mode
7.2 years ago
Ana ▴ 200

Hello everyone,

I have a vcf-file contains nearly 11millions SNPs. I want to convert my vcf file into 012 genotype matrix for LD pruning. I am using this code:

/data/programs/vcftools_0.1.13/bin/vcftools --vcf my.file.vcf
--012  --out output_geno.vcf 

So, I get the output, but I am confused. According to manual the output 012 genotype matrix rows are individuals and columns are genotypes. I have 11million SNPs, should not get 11million columns (one columns per SNP)? when I count number of columns it is only nearly one million! Is there anything wrong or am I doing a ridiculous mistake?

Thanks for any help to figure out my mistake

vcftools genotype-matrix • 8.5k views
ADD COMMENT
0
Entering edit mode

Did you check the *.indiv and *.pos files that are also output with the --012 parameter? The *.indiv file should obviously cotain the expected number of samples that were in the input VCF.

Also, check the log file that's produced, particularly the line:

"After filtering, kept X out of a possible Y Sites"

Kevin

ADD REPLY
0
Entering edit mode

How did you count the columns?

If you do something like this:

head -n 1 file.012  |  awk '{print NF}'

Do you have the right number of columns?

ADD REPLY
0
Entering edit mode
4.0 years ago
Jautis ▴ 580

The 012 output is n-by-features, rather than features-by-n. That means that each row is an individual and each column is a snp.

ADD COMMENT

Login before adding your answer.

Traffic: 1782 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6