Converting Vcftools output to R readable format
0
1
Entering edit mode
9.5 years ago
pifferdavide ▴ 110

I have used the following Vcftools code to calculate Fst distances from 1000 Genomes Chr 1 data.

c:/path --vcf ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf --weir-fst-pop POP1.txt --weir-fst-pop POP2.txt --out fst.POP1.POP2

I need to extract the 3 variance components, namely a,b,c (between populations; between individuals within populations; between gametes within individuals within populations) that this formula uses to get to the final result.

However, Vcftools does not provide this function. I have been suggested http://stackoverflow.com/questions/30122116/extrapolating-variance-components-from-weir-fst-on-vcftools to use Hierfstat R package. However, Hierfstat does not read .vcf files. Thus,

  1. Use vcftools with --012 to get genotypes
  2. Convert 0/1/2/-1 to hierfstat format (eg., 11/12/22/NA)
  3. Load the data into hierfstat and compute (see below)

I have carried out step 1. But now I need to carry out step 2. I am not familiar with Hierfstat package. Can someone tell me how to convert the Vcftools output files (out.012, out.012.indv, out.012.pos) to hierfstat format?

vcf SNP R vcftools • 7.1k views
ADD COMMENT
2
Entering edit mode

If you are familiar with R, you could consider reading the VCF files directly into R and manipulating them there. The VariantAnnotation Bioconductor package has a readVCF function to do just that.

ADD REPLY
0
Entering edit mode

I installed VariantAnnotation.I followed instructions to run the readVcf command at http://www.bioconductor.org/packages/release/bioc/vignettes/VariantAnnotation/inst/doc/VariantAnnotation.pdf

I set the working directory to the folder containing the vcf file and ran the following code. I guess it is telling me that my RAM memory (8gb) is too small to open Chr1?

> library(VariantAnnotation)
> fl <- ("ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf ")
> vcf <- readVcf(fl, "hg19")
Error: scanVcf: 'Realloc' could not re-allocate memory (5128192000 bytes)
  path: C:\Users\Davide\vcf1\ALL.chr1.phase3_shapeit2_mvncall_integrated_v5.20130502.genotypes.vcf
In addition: Warning messages:
1: In doTryCatch(return(expr), name, parentenv, handler) :
  Reached total allocation of 8077Mb: see help(memory.size)
2: In doTryCatch(return(expr), name, parentenv, handler) :
  Reached total allocation of 8077Mb: see help(memory.size)
ADD REPLY
0
Entering edit mode

Yes, you are correct about memory usage. You can use params to control what is read in. In particular, if you simply need genotypes and not the info() fields, check out readGT() or readGeno().

ADD REPLY
0
Entering edit mode
fl <- ("ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz")
vcf <- readGT(fl, "hg19")
Error: scanVcf: _DNAencode(): key 48 not in lookup table
  path: C:\Users\Davide\Documents\genomes\chr1\ALL.chr1.phase3_shapeit2_mvncall_integrated_v5a.20130502.genotypes.vcf.gz
In addition: Warning message:
In .vcf_usertag(map, tag, ...) :
  ScanVcfParam 'geno' fields not present: 'GT'
vcf
Error: object 'vcf' not found
ADD REPLY
0
Entering edit mode

See my example in the original post at http://stackoverflow.com/questions/30122116/extrapolating-variance-components-from-weir-fst-on-vcftools. Should get your nearly all the way there

ADD REPLY

Login before adding your answer.

Traffic: 2978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6