Entering edit mode
8.6 years ago
Neilfws
49k
I'm trying to use Beagle 4.1 to phase a VCF file from Illumina's UYG program, using 1000 Genomes phase 3 as a reference panel. It's failing with the error:
ERROR: Missing one or both alleles for a genotype:
Indeed, when I examine the VCF file I see lines like this one (scroll right):
chr12 1899470 . C T 239 PASS SNVSB=-26.8;SNVHPOL=3;CSQ=T||NM_172364.4|Transcript|downstream_gene_variant|||||||||CACNA2D4|||||1653|YES||||NP_758952.4|||||,T||NM_024551.2|Transcript|downstream_gene_variant|||||||||ADIPOR2|||||1625|YES||||NP_078827.2||||| GT:GQ:GQX:DP:DPF:AD 1:33:33:18:2:0,17
where the value for GT = 1.
Questions:
- is GT = 1 valid VCF? I had the impression it was not
- is there a smart way to make Beagle ignore these lines? I didn't see anything in the documentation
- or a way to remove these lines in preprocessing using e.g. vcftools?
using vcffilterjs : https://github.com/lindenb/jvarkit/wiki/VCFFilterJS remove the lines having one genotype where num(alleles)!=2.