I have used the following Vcftools code to calculate Fst distances from 1000 Genomes Chr 1 data.
c:/path --vcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2
I need to extract the 3 variance components, namely a,b,c (between populations; between individuals within populations; between gametes within individuals within populations) that this formula uses to get to the final result.
However, Vcftools does not provide this function. I have been suggested http://stackoverflow.com/questions/30122116/extrapolating-variance-components-from-weir-fst-on-vcftools to use Hierfstat R package. However, Hierfstat does not read .vcf files. Thus,
- Use
vcftools
with--012
to get genotypes - Convert 0/1/2/-1 to hierfstat format (eg., 11/12/22/NA)
- Load the data into hierfstat and compute (see below)
I have carried out step 1. But now I need to carry out step 2. I am not familiar with Hierfstat package. Can someone tell me how to convert the Vcftools output files (out.012
, out.012.indv
, out.012.pos
) to hierfstat format?
If you are familiar with R, you could consider reading the VCF files directly into R and manipulating them there. The VariantAnnotation Bioconductor package has a readVCF function to do just that.
I installed VariantAnnotation.I followed instructions to run the readVcf command at http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/VariantAnnotation.pdf
I set the working directory to the folder containing the vcf file and ran the following code. I guess it is telling me that my RAM memory (8gb) is too small to open Chr1?
Yes, you are correct about memory usage. You can use params to control what is read in. In particular, if you simply need genotypes and not the
info()
fields, check outreadGT()
orreadGeno()
.See my example in the original post at http://stackoverflow.com/questions/30122116/extrapolating-variance-components-from-weir-fst-on-vcftools. Should get your nearly all the way there