I have split a multi-sample vcf into each sample, and I was wondering if there was a simple method for fixing the genotype for variants to 0/1 for each single-sample vcf, and changing the alternate allele to match the new genotype. For instance, if the vcf contains a read like
chr1 945122 . C A,T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/2:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584
then I would like it to read
chr1 945122 . C T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/1:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584
so that the alt allele and the genotype still match, just no other alternate alleles are in the VCF.
Is there a tool for tidying a vcf up like this?
Isn't this just a
bcftools norm -m-any
solution?Yeah, if done before splitting. Also, I'd recommend
vt decompose
overbcftools norm -m-any
- vt retains information on the variants it "transforms".You should split multi-allelic sites before splitting VCFs into sample-specific VCFs, if such "clean" genotypes are a necessity.
Use
vt decompose
to split multi-allelics, then any tool of your choice to get single sample VCFs.By the way, an entry in a VCF file is a site/location/variant, not a "read".