Hi all,
I have been dealing for a while about getting draft genome of bacteria. I have chosen a closely related bacteria to make reorder contigs and concatenate them using ABACAS. This closely related bacteria was selected based on phylogenetic tree result. However, the problem is that genome size of organism and size of assembly is different.
After I did this, I visualized circular genome of the species but the size of assembly normally increased and some gaps (NNNNs) occured. Therefore, I am a bit confused If I am on the right way. What do you think about it ? Do you think that this approach is true ? Or, since size of reference genome is bigger than assembly, I could remove gaps in the draft genome in the final step. Does it make sense ?
Best,
If you are missing sequence in your data then there is simply no way to create it. If you must have a closed circular genome you may need think about creating a net library or use a different technique (e.g. nanopore long reads) to retrieve the missing data. Have you run a program to estimate how complete your current assembly is?
Thanks for quick reply, genomax. Right ! Maybe the safest approach is that get long reads to polish it. Unfortunately in my situation, I have to apply only bioinformatics approaches.
I evaluated my assembly using BUSCOs and I got good results over 99%. Can you explain creating a net library a bit? What do you mean ?
I just edited my comment: they are not the same species, just closely related to each other. So, I don't think I missed sequences in my data or assembly. I expect a new species, but since I had many contigs, I couldn't find a way to get draft genome on the assembly, except this approach.
If BUSCO analysis indicates a relatively complete genome then you could go forward with what you have, if you are not able to close the genome completely. Bioinformatics approaches are only as good as the data at hand and sounds like you have got the most out of the data already.