Entering edit mode
4.8 years ago
dec986
▴
380
I have a large gVCF that I'm trying to get into a VCF using a bed file that looks like this
1 58813 58814 rs114420996 . G A PASS . GT:GQ ./.:0.0
1 565507 565508 rs9283150 . G A PASS . GT:GQ ./.:0.0
1 567091 567092 rs9326622 . T C PASS . GT:GQ ./.:0.0
1 726911 726912 1:726912 . A G PASS . GT:GQ 0/0:0.27129138
and getting the necessary positions thus:
break_blocks --region-file $bed --ref human_g1k_v37.fasta --exclude-off-target
which produces a gVCF with the correct regions.
However, this has to be a VCF, not a gVCF.
Thus, I convert using advice from Converting Gvcf Files Into Vcf extract variants, but this produces a file with about 75% of the data missing, which isn't acceptable. I get similar results when using
gatk SelectVariants -R $fasta -V $vcf -O $outfile --exclude-non-variants
how can I get all of the 661,000 or so positions extracted from this gVCF?
Unless I've not missed an important point you can use
bcftools
to extract variant sites from a gvcfsThe
-m
parameter filters for sites with a minimum number of alleles listed in REF and ALT.