Question

Evaluating the quality of draft genomes: checkm vs reapr

2

Entering edit mode

9.9 years ago

fhsantanna ▴ 620

I have assembled four bacterial genomes derived from Miseq 2x300 pairedend data using SPADES and A5 pipeline. I also utilized CLC Workbench and CISA (to join assemblies), but they performed poorly.

After checking the assemblies with reapr and checkm, I got contradictory statistics.

Checkm demonstrated that the completeness of all genomes are over 99%.

However, the results of reapr are frustrating. One of the genomes has a proportion of error free bases of 22%. In fact, looking closely, all genome assemblies are full of N's after the reapr processing, even though some of the genomes present a proportion of error free bases over 80%... All genomes present FCD errors (one with more than 60!)

I tried to use Gapfiller, but it did not filled any gap of the reapr processed contigs...

I noted that all assemblies do not have high coverage, most are around 15X.

My main objective is to evaluate the presence of some genes of interest (less than a hundred) and to compare the genomes to other related ones. Therefore, I do not need a finished genome.

What do you recommend that I should do before submitting the genomes to Genbank? Should I sequence more? Isn't REAPR too much stringent? Is there any other software that could improve my draft genome using the actual sequencing data?

Thank you

checkm reapr draft-genome • 3.2k views

ADD COMMENT • link updated 2.7 years ago by Ram 44k • written 9.9 years ago by fhsantanna ▴ 620

0

Entering edit mode

Did you ever get to solve this issue?

ADD REPLY • link 7.1 years ago by sicat.paolo20 ▴ 30

0

Entering edit mode

I'm interested too ^^

Could there be a problem in the mapped reads file you gave to reapr?

ADD REPLY • link updated 2.7 years ago by Ram 44k • written 6.5 years ago by lagartija ▴ 160