Entering edit mode
6.0 years ago
gbdias
▴
160
Hi,
I am searching for tools to visualize regions of a whole genome assembly that are represented twice or more times.
- Most genome assemblers will collapse haplotypes that are similar enough, generating a "pseudo-haploid" final assembly.
- However, if the haplotypes are divergent past a certain threshold or if they have large scale structural variants the assemblers will likely output two contigs (or haplotigs), one for each haplotype. I am searching for a good way to visualize such haplotypes.
- I am aware of some methods to guess the overall duplication level of the assembly, such as: read mapping and depth analysis (like purge_haplotigs by Mike Roach) and BUSCO duplication level.
- However, I'd like to know if a more visual tool is available.
- I am also aware of the MUMmer package and mummerplots, but those are not the most easy way to visualize duplicated regions since MUMmer orders the contigs on a diagonal based on the best alignments with the reference. Using the --maxmatch option will display all alignments but those are not ordered in an intelligible manner so the whole thing becomes too polluted.
Any suggestions are welcome.
Thanks, Guilherme
I'm a little uncertain what specifically are asking for but visualization wise you can perhaps give D-Genies a try. Are you also inquiring about approach to detect 'duplications'?
Hi,
Sorry if the question was not clear. I guess a simpler way to phrase it would be: How do you align a full diploid assembly to a haploid reference genome and visualize it in dot plot style.
Thanks for the D-Genies suggestion. I suspect it will do the same as MUMmerplot but I will take a look.