I have assembled four bacterial genomes derived from Miseq 2x300 pairedend data using SPADES and A5 pipeline. I also utilized CLC Workbench and CISA (to join assemblies), but they performed poorly.
After checking the assemblies with reapr and checkm, I got contradictory statistics.
Checkm demonstrated that the completeness of all genomes are over 99%.
However, the results of reapr are frustrating. One of the genomes has a proportion of error free bases of 22%. In fact, looking closely, all genome assemblies are full of N's after the reapr processing, even though some of the genomes present a proportion of error free bases over 80%... All genomes present FCD errors (one with more than 60!)
I tried to use Gapfiller, but it did not filled any gap of the reapr processed contigs...
I noted that all assemblies do not have high coverage, most are around 15X.
My main objective is to evaluate the presence of some genes of interest (less than a hundred) and to compare the genomes to other related ones. Therefore, I do not need a finished genome.
What do you recommend that I should do before submitting the genomes to Genbank? Should I sequence more? Isn't REAPR too much stringent? Is there any other software that could improve my draft genome using the actual sequencing data?
Thank you
Did you ever get to solve this issue?
I'm interested too ^^
Could there be a problem in the mapped reads file you gave to reapr?