Question

Constructing reference-based draft genome

0

Entering edit mode

4.4 years ago

young_bioinformatician ▴ 240

Hi all,

I have been dealing for a while about getting draft genome of bacteria. I have chosen a closely related bacteria to make reorder contigs and concatenate them using ABACAS. This closely related bacteria was selected based on phylogenetic tree result. However, the problem is that genome size of organism and size of assembly is different.

After I did this, I visualized circular genome of the species but the size of assembly normally increased and some gaps (NNNNs) occured. Therefore, I am a bit confused If I am on the right way. What do you think about it ? Do you think that this approach is true ? Or, since size of reference genome is bigger than assembly, I could remove gaps in the draft genome in the final step. Does it make sense ?

Best,

Assembly genome next-gen • 829 views

ADD COMMENT • link 4.4 years ago by young_bioinformatician ▴ 240

0

Entering edit mode

If you are missing sequence in your data then there is simply no way to create it. If you must have a closed circular genome you may need think about creating a net library or use a different technique (e.g. nanopore long reads) to retrieve the missing data. Have you run a program to estimate how complete your current assembly is?

ADD REPLY • link 4.4 years ago by GenoMax 147k

0

Entering edit mode

Thanks for quick reply, genomax. Right ! Maybe the safest approach is that get long reads to polish it. Unfortunately in my situation, I have to apply only bioinformatics approaches.

I evaluated my assembly using BUSCOs and I got good results over 99%. Can you explain creating a net library a bit? What do you mean ?

I just edited my comment: they are not the same species, just closely related to each other. So, I don't think I missed sequences in my data or assembly. I expect a new species, but since I had many contigs, I couldn't find a way to get draft genome on the assembly, except this approach.

ADD REPLY • link 4.4 years ago by young_bioinformatician ▴ 240

1

Entering edit mode

If BUSCO analysis indicates a relatively complete genome then you could go forward with what you have, if you are not able to close the genome completely. Bioinformatics approaches are only as good as the data at hand and sounds like you have got the most out of the data already.

ADD REPLY • link 4.4 years ago by GenoMax 147k