I am trying to assemble a bacterial isolate genome from illumina miseq reads. the propose is to go forward for closing the complete genome and use it for comparative genomics (for now, I just have draft genome in 100 contigs). I have gone trough each step of read quality trimming, de novo assembly and scaffolding and then mapped the reads to these scaffolds. then I checked the completeness and contamination of the genome.
I have many problems understanding the basics. anybody can help me through these questions?
1- I get around 115 contigs larger than 500bp, each one with a different coverage, ranging from 4 to 180! basically I know that low coverage means it is not reliable or high coverage ones might be of plasmid origin. but how much coverage is the minimum acceptable or how much coverage is considered as high in my case?
p.s: I have checked the contigs against plasmid databases but no plasmid gene or plasmid ori seqs was detected. same for bacteriophages
2-when I assemble using Spades, a coverage is reported for each contig. when I map reads back to these contigs using qualimap, another coverage is reported by qualimap which is very different from Spades report. e. g. for contig number 99 (the second one in linked image), spades reports a 27x coverage where qualimap reports 75! what's the reason they are so different and which one should I rely on?
3- when I check the mapped contigs, in most cases, both ends of each contig has much lower coverage than the mid part and sometimes it is just mapped by one or few reads. are these contig ends reliable or should I omit (trim) these ends?
thank you in advance
Hi Brian
Would you advise a different quality filtering/trimming pipeline for going into taxonomic profiling instead of assembly?
What do you mean by taxonomic profiling? Is this a 16S metagenomic analysis, or a shotgun metagenome, for example?
Hi, I'm starting with genome assemblies and I think my situation is pretty similar to the one of mery. I'm assembling some genomes with an expected length of around 4Mb and after trimming with bbduk and getting nice Fastqc reports, I run spades but I get about 150 contigs. The last of them are quite short (less than 1000 bp), and, if I remove all these short contigs I end up with a 84 contigs assembly. I don't know if this approach is correct or if I should try any other thing.
When following the instructions you gave above, I get 177 contigs, once again if I remove all the small contigs below 1000 bp, I finish with a 85 contigs assembly. Do you know any other pipeline or an additional step to improve the assembly?