I have assembled four bacterial genomes derived from MiSeq pair-ended sequencing data using the following steps:
- Assembly using CLC Workbench;
- Assembly using SPADES;
- Assembly using A5 pipeline;
- Merging of the three assembles using CISA;
- Quality check of the assemblies using QUAST.
For checking the misassemblies, QUAST relies on a reference genome. However, for most of my draft genomes, I do not have a proper reference genome (too much genome differences in relation to those deposited in Genbank).
So, I ask you. How could I validate the genome assembly using intrinsic data? For example, using read mapping, what are the criteria to correct some regions? What is the best software for this purpose?
Thanks
Should I use corrected reads or brute ones? I have used the brute ones on the contigs and most of them were not mapped...
I use raw reads, as modern aligners are quite good at aligning even poor quality reads. If a lot of your reads fail to align, it doesn't necessarily mean your assembly is wrong. You can check your reads quality ie with FastQC.