The major reason is that the venter genome is sequenced to ~9X coverage. There is a high chance that you miss an allele due to sampling fluctuation. You cannot get a good het:hom ratio from huref. In addition, the reference genome has higher indel sequencing error rate than substitution error rate. This makes het:hom of indels lower than that of snps, even if your indel calling is perfect.
The venter reads have a particularly higher 1bp insertion error rate. It is not a good idea to learn indel statistics from huref in general, though this should not explain a low het:hom ratio.
If the sample is not admixed and come from the same population of the reference genome, the theoretical expectation is het:hom=2:1. The derivation is very simple (see the maq paper). However, the reference genome is a hybrid. The het:hom is always lower than 2.
Also, you observe homozygous variants mostly due to coalescence, not due to recurrent mutations.
Yes, since the rate of mutation is very low, homozygosity of alleles is usually due to identify by descent - that is a single mutation occured in the past and comes together in an individual because of inbreeding-like effects in a finite population. Alleles can be identical by state if there is a high mutation rate (e.g. for microsatellites in some species) or a long enough time for the same recurrent mutation to occur, which is usually unlikely in a population genetic setting but can be observed over phylogenetic timescales.
I agree. If a small population shows a higher frequency of the indel in both homozygotes and heterozygotes than any of several other populations, one must consider a founder effect.
I agree. If a small population shows a higher frequency of the indel in both homozygotes and heterozygotes than any of several other populations, one must consider a founder effect.