Hello, I am trying to generate a consensus fasta file for one sample from an unphased VCF. I have been using bcftools consensus, which works well, but I am running into problems with treating the heterozygous sites. I am not able to adequately phase the data, so I would like to randomly select one allele at each heterozygous site for the reference. bcftools allows options to use ambiguity codes, or to always select the reference allele or always the alternate allele, but each of these options would cause bias in my downstream analyses.
Is there a program that can either phase a VCF randomly, or that can generate a consensus fasta while randomly selecting one allele per heterozygous site?
Thank you!
(PS this is my bcftools consensus command):
bcftools consensus --fasta-ref reference.fasta --sample SampleName -M N -a N -H 1 MyVCF.vcf.gz
I would explore writing a simple, text transformation tool to modify the genotype in the VCF file for each heterozygous genotype. Basically replacing 0/1 with either 0/0 or 1/1