Question

How to visualize duplicated segments in a de novo Whole Genome Assembly

0

Entering edit mode

6.0 years ago

gbdias ▴ 160

Hi,

I am searching for tools to visualize regions of a whole genome assembly that are represented twice or more times.

Most genome assemblers will collapse haplotypes that are similar enough, generating a "pseudo-haploid" final assembly.
However, if the haplotypes are divergent past a certain threshold or if they have large scale structural variants the assemblers will likely output two contigs (or haplotigs), one for each haplotype. I am searching for a good way to visualize such haplotypes.
I am aware of some methods to guess the overall duplication level of the assembly, such as: read mapping and depth analysis (like purge_haplotigs by Mike Roach) and BUSCO duplication level.
However, I'd like to know if a more visual tool is available.
I am also aware of the MUMmer package and mummerplots, but those are not the most easy way to visualize duplicated regions since MUMmer orders the contigs on a diagonal based on the best alignments with the reference. Using the --maxmatch option will display all alignments but those are not ordered in an intelligible manner so the whole thing becomes too polluted.

Any suggestions are welcome.

Thanks, Guilherme

wgs duplication visualization • 1.8k views

ADD COMMENT • link 6.0 years ago by gbdias ▴ 160

0

Entering edit mode

I'm a little uncertain what specifically are asking for but visualization wise you can perhaps give D-Genies a try. Are you also inquiring about approach to detect 'duplications'?

ADD REPLY • link 6.0 years ago by lieven.sterck 15k

0

Entering edit mode

Hi,

Sorry if the question was not clear. I guess a simpler way to phrase it would be: How do you align a full diploid assembly to a haploid reference genome and visualize it in dot plot style.

Thanks for the D-Genies suggestion. I suspect it will do the same as MUMmerplot but I will take a look.

ADD REPLY • link 6.0 years ago by gbdias ▴ 160