Hi,
We have sequenced a chloroplast DNA from potato and assembled the reads by SPAdes assembler. After assembly, we got 3 contigs.
Now we want to perform the following steps:
- Generate a single DNA sequence from these 3 contigs.
Q. Would you please suggest any pipeline or tool or method to do this?
- Then we want to use this single DNA sequence for annotation, where we should get .gff, .gb, etc files.
Q. Which tool we should use for this purpose?
Thanks in advance.
few points to get this thread started:
if the data is not available then no tool will create a single DNA seq. Don't know this specific situation but you could also consider some targeted approaches (eg. create primers and do run-off sequencing to close the remaining gaps). If the data to close the gaps is in your sequencing pool you can consider applying some gap-closing tools (google for this term)
a chloroplast, though present in an eukaryotic organism is therefore not eukaryotic! Moreover chloroplast are from bacterial origin and thus don't have the typical characteristics of eukaryotes but rather prokaryotic . Moreover there likely exists specific tools to annotate chloroplast (most work on similarity with known genes/proteins)
Last point: what have you tried/considered so far? Did you look for tools or approaches?
Thank you for pointing out the prokaryotic fact.
If we take a reference sequence from NCBI and align our contigs against it, can we generate a single consensus sequence?
Initially, we thought about using Prokka for annotation.
This review might be somewhat helpful.