Hi, I have a reference fasta file and a vcf file of SNP variant calls. For each individual in the vcf file, I want to create a new, "pseudo-haploid" fasta file where every base is randomly sampled from one of the individual's alleles.
This seems to be a different problem than GATK's FastaAlternateReferenceMaker which inserts the alternate allele (not caring about individual genotypes or ever retaining the reference allele).
Can anybody offer a tool or some advice? Thanks!
Short example:
Sequence: ATAAATTCCC (10 bp long)
VCF:
POS REF ALT Ind1 Ind2
2 T C 0/1 1/1
5 A G 0/0 0/1
Output:
Ind1: A **T** AAATTCCC or A **C** AAATTCCC
Ind2: A*C*AA **A** TTCCC or A*C*AA **G** TTCCC