Evaluating the quality of draft genomes: checkm vs reapr
0
2
Entering edit mode
9.9 years ago
fhsantanna ▴ 620

I have assembled four bacterial genomes derived from Miseq 2x300 pairedend data using SPADES and A5 pipeline. I also utilized CLC Workbench and CISA (to join assemblies), but they performed poorly.

After checking the assemblies with reapr and checkm, I got contradictory statistics.

Checkm demonstrated that the completeness of all genomes are over 99%.

However, the results of reapr are frustrating. One of the genomes has a proportion of error free bases of 22%. In fact, looking closely, all genome assemblies are full of N's after the reapr processing, even though some of the genomes present a proportion of error free bases over 80%... All genomes present FCD errors (one with more than 60!)

I tried to use Gapfiller, but it did not filled any gap of the reapr processed contigs...

I noted that all assemblies do not have high coverage, most are around 15X.

My main objective is to evaluate the presence of some genes of interest (less than a hundred) and to compare the genomes to other related ones. Therefore, I do not need a finished genome.

What do you recommend that I should do before submitting the genomes to Genbank? Should I sequence more? Isn't REAPR too much stringent? Is there any other software that could improve my draft genome using the actual sequencing data?

Thank you

checkm reapr draft-genome • 3.3k views
ADD COMMENT
0
Entering edit mode

Did you ever get to solve this issue?

ADD REPLY
0
Entering edit mode

I'm interested too ^^

Could there be a problem in the mapped reads file you gave to reapr?

ADD REPLY

Login before adding your answer.

Traffic: 1955 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6