extracting info field from VCF using vcfr
0
0
Entering edit mode
5.1 years ago
j.lunger18 ▴ 30

I used several tools to annotate a list of variants and now have a VCF file with a long info field. I'm trying to use the 'vcfr' package in R to start data analysis, but it seems to have a problem reading my info field, which is separated by semi-colons (edit; I previously said it was separated by colons). Here is what I did:

head(vcf_file) 
[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##hailversion=devel-784ab2796878"
[1] "##FILTER=<ID=AC0,Description=\"Allele count is zero after filtering o [Truncated]"
[1] "##FILTER=<ID=InbreedingCoeff,Description=\"InbreedingCoeff < -0.3\">"
[1] "##FILTER=<ID=PASS,Description=\"Passed all variant filters\">"
[1] "##FILTER=<ID=RF,Description=\"Failed random forest filtering threshol [Truncated]"
[1] "First 6 rows."
[1] 
[1] "***** Fixed section *****"
     CHROM POS         ID             REF   ALT QUAL      FILTER
[1,] "4"   "119606531" "rs1429579493" "A"   "G" "157.46"  "PASS"
[2,] "4"   "119606549" "rs998903473"  "GTC" "G" "609.46"  "PASS"
[3,] "4"   "119606551" "rs992734531"  "C"   "A" "1821.94" "PASS"
[4,] "4"   "119606553" "rs568037846"  "C"   "G" "8320.87" "PASS"
[5,] "4"   "119606555" "rs536993934"  "A"   "C" "2326.65" "PASS"
[6,] "4"   "119606556" NA             "C"   "T" "297.46"  "PASS"
[1] 
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1] 
[1] "Unique GT formats:"
[1] "No gt slot present"

Seeing that it didn't give me a genotype section, I tried:

extract.gt(vcf_file)
Error in if (colnames(x@gt)[1] != "FORMAT") { : 
  argument is of length zero

I'm positive that there is a long INFO field because I opened this file up in excel and saw it...

R vcf vcfr annotation genome • 3.0k views
ADD COMMENT
0
Entering edit mode

Why not write an expression to sub out the colons in the INFO field and replace with semicolons?

Edit: the justification for this suggestion, from the VCF specification -

INFO - additional information: (String, no white-space, semi-colons, or equals-signs permitted; commas are permitted only as delimiters for lists of values) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: <key>=<data>[,data]. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):

ADD REPLY
0
Entering edit mode

My bad. I meant semicolons!

ADD REPLY

Login before adding your answer.

Traffic: 2544 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6