MASURCA output
1
0
Entering edit mode
3.7 years ago
kilcdincer ▴ 10

Hi,

This is the first time I am trying to assemble diploid genome. And I need help to understand MASURCA output. I run MASURCA for candida genome which is supposed to has around 15 Mb haploid genome size. MASURCA gave;

ESTIMATED_GENOME_SIZE.txt: 32 Mb Ploidy.txt = 1 Total length = 15 Mb

How should I interpret the estimated genome size and total length results? And why did I get ploidy = 1. So is my genome diploid or haploid according to this result?

Thank you..

masurca • 2.1k views
ADD COMMENT
0
Entering edit mode
3.7 years ago
Carambakaracho ★ 3.3k

As the expected genome size seems okay, I wouldn't worry too much. Besides, this is a known issue, see this masurca ticket on github

Aleksey wrote:

If the genome is very heterozygous, it looks to assembler as one genome with double the size, as opposed to two similar copies of the same genome.

You can confirm that easily by aligning the reads you used to assemble back to the assembly, you should be able to see at least some heterozygous differences from the alignment directly.

ADD COMMENT
0
Entering edit mode

Thank you. I understood ploidy part.

My genome is supposed to belong to diploid Candida albicans. And genome size of diploid Candida albicans is around 29Mb. But MASURCA shows the total length around 15Mb. I am a bit confused. So MASURCA gives in statistics of only haploid genome of our sample? Or what do "total length" and "estimated genome size" stand for?

ADD REPLY
1
Entering edit mode

Usually assemblies support a haploid consensus, most C. albicans assemblies have a size of around 15 Mb.

ADD REPLY

Login before adding your answer.

Traffic: 1825 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6