In polyploid genomes with limited variation between subgenomes generally read mapping is a challenge leading to frequent read mismapping and hence calling homeologous variants, eg variations that are actually differences between the subgenomes.
When using discoSnp, you can imagine that these loci could also collapse and variants would appear heterozygous among all samples if the locus is non-branching. One could then use read frequency to decide if the variant is homeologous. One would expect to observe a 50/50 distribution of the two alleles in case of a tetraploid or a 25/75 distribution in case of a true variant as the non variant locus would contribute relatively more to the stack.
However when branching becomes more complex such an approach may become difficult and such variants might still end up in the same locus while actually originating from different loci. Are there any strategies that could be applied to recognize these cases and discern between the different loci?
Related to this, when does discoSnp decide that a graph becomes too complex and decides to split it into separate graphs.
Thanks for the reply.
Thanks for the clear explanation Pierre.