Entering edit mode
5.1 years ago
j.lunger18
▴
30
I used several tools to annotate a list of variants and now have a VCF file with a long info field. I'm trying to use the 'vcfr' package in R to start data analysis, but it seems to have a problem reading my info field, which is separated by semi-colons (edit; I previously said it was separated by colons). Here is what I did:
head(vcf_file)
[1] "***** Object of class 'vcfR' *****"
[1] "***** Meta section *****"
[1] "##fileformat=VCFv4.2"
[1] "##hailversion=devel-784ab2796878"
[1] "##FILTER=<ID=AC0,Description=\"Allele count is zero after filtering o [Truncated]"
[1] "##FILTER=<ID=InbreedingCoeff,Description=\"InbreedingCoeff < -0.3\">"
[1] "##FILTER=<ID=PASS,Description=\"Passed all variant filters\">"
[1] "##FILTER=<ID=RF,Description=\"Failed random forest filtering threshol [Truncated]"
[1] "First 6 rows."
[1]
[1] "***** Fixed section *****"
CHROM POS ID REF ALT QUAL FILTER
[1,] "4" "119606531" "rs1429579493" "A" "G" "157.46" "PASS"
[2,] "4" "119606549" "rs998903473" "GTC" "G" "609.46" "PASS"
[3,] "4" "119606551" "rs992734531" "C" "A" "1821.94" "PASS"
[4,] "4" "119606553" "rs568037846" "C" "G" "8320.87" "PASS"
[5,] "4" "119606555" "rs536993934" "A" "C" "2326.65" "PASS"
[6,] "4" "119606556" NA "C" "T" "297.46" "PASS"
[1]
[1] "***** Genotype section *****"
<0 x 0 matrix>
[1]
[1] "Unique GT formats:"
[1] "No gt slot present"
Seeing that it didn't give me a genotype section, I tried:
extract.gt(vcf_file)
Error in if (colnames(x@gt)[1] != "FORMAT") { :
argument is of length zero
I'm positive that there is a long INFO field because I opened this file up in excel and saw it...
Why not write an expression to sub out the colons in the INFO field and replace with semicolons?
Edit: the justification for this suggestion, from the VCF specification -
My bad. I meant semicolons!