Question

MASURCA output

0

Entering edit mode

3.7 years ago

kilcdincer ▴ 10

Hi,

This is the first time I am trying to assemble diploid genome. And I need help to understand MASURCA output. I run MASURCA for candida genome which is supposed to has around 15 Mb haploid genome size. MASURCA gave;

ESTIMATED_GENOME_SIZE.txt: 32 Mb Ploidy.txt = 1 Total length = 15 Mb

How should I interpret the estimated genome size and total length results? And why did I get ploidy = 1. So is my genome diploid or haploid according to this result?

Thank you..

masurca • 2.1k views

ADD COMMENT • link updated 3.7 years ago by Carambakaracho ★ 3.3k • written 3.7 years ago by kilcdincer ▴ 10

score 0 · Answer 1 · 2021-04-15

0

Entering edit mode

3.7 years ago

Carambakaracho ★ 3.3k

As the expected genome size seems okay, I wouldn't worry too much. Besides, this is a known issue, see this masurca ticket on github

Aleksey wrote:

If the genome is very heterozygous, it looks to assembler as one genome with double the size, as opposed to two similar copies of the same genome.

You can confirm that easily by aligning the reads you used to assemble back to the assembly, you should be able to see at least some heterozygous differences from the alignment directly.

ADD COMMENT • link 3.7 years ago by Carambakaracho ★ 3.3k

0

Entering edit mode

Thank you. I understood ploidy part.

My genome is supposed to belong to diploid Candida albicans. And genome size of diploid Candida albicans is around 29Mb. But MASURCA shows the total length around 15Mb. I am a bit confused. So MASURCA gives in statistics of only haploid genome of our sample? Or what do "total length" and "estimated genome size" stand for?

ADD REPLY • link 3.7 years ago by kilcdincer ▴ 10

1

Entering edit mode

Usually assemblies support a haploid consensus, most C. albicans assemblies have a size of around 15 Mb.

ADD REPLY • link 3.7 years ago by Carambakaracho ★ 3.3k