I am trialing discoSNP++ as part of a bacterial GWAS pipeline and am seeking some clarification on the genotypes in the multisample VCFs. Similar to the Pseudomonas example provided in the VCF_creator user guide (pages 4-5), we see a large number heterozygous genotypes despite all the samples being from haploid organisms (e.g. 0/0, 0/1, 1/1). How should we interpret these heterozygous reads from our bacterial sequence data (paired-end data provided in the fof_reads1.txt and fof_reads2.txt structure as described in Case 4 of the discoSNP user guide). Any guidance appreciated!
Thanks for the great (and quick!) explanation.