Hello Biostars community,
I am working with a repeated sequence, representing about 1 % of my genome. I do not have an assembly. I want to generate a nucleotide HMM that works well in detecting my repeat on a wide variety of sequences.
Do you have any experience with generating nHMMs directly from a mapping of Illumina reads to a consensus sequence? The read lengths are shorter than the repeat length, but its sequence is covered >1000x.
What can I expect? Will the resulting nHMM accurately represent the repeat variants or should I maybe go for a different, maybe iterative approach?
Hearing about your opinions or experiences would be great.