Fixing genotypes from split vcf

0

Entering edit mode

4.4 years ago

graeme.thorn ▴ 100

I have split a multi-sample vcf into each sample, and I was wondering if there was a simple method for fixing the genotype for variants to 0/1 for each single-sample vcf, and changing the alternate allele to match the new genotype. For instance, if the vcf contains a read like

chr1 945122 . C A,T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/2:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

then I would like it to read

chr1 945122 . C T . PASS <INFO STRING> GT:ABQ:AD:ADF:ADR:DP:FREQ:GQ:PVAL:RBQ:RD:RDF:RDR:SDP 0/1:57:4:1:3:584:0.68%:0:9.8E-1:50:578:449:129:584

so that the alt allele and the genotype still match, just no other alternate alleles are in the VCF.

Is there a tool for tidying a vcf up like this?

vcf • 984 views

ADD COMMENT • link 4.4 years ago by graeme.thorn ▴ 100

0

Entering edit mode

Isn't this just a bcftools norm -m-any solution?

ADD REPLY • link 4.4 years ago by Kevin Blighe 88k

0

Entering edit mode

Yeah, if done before splitting. Also, I'd recommend vt decompose over bcftools norm -m-any - vt retains information on the variants it "transforms".

ADD REPLY • link 4.4 years ago by Ram 44k

0

Entering edit mode

You should split multi-allelic sites before splitting VCFs into sample-specific VCFs, if such "clean" genotypes are a necessity.

Use vt decompose to split multi-allelics, then any tool of your choice to get single sample VCFs.

By the way, an entry in a VCF file is a site/location/variant, not a "read".

ADD REPLY • link 4.4 years ago by Ram 44k

Login before adding your answer.