Hi,
I am having trouble submitting a bacteriophage genome in Genbank. I have completed the assembly (Unicycler) and annotation (Rast) of the assembled genome. For GenBank submission, I created the feature table in the required format and submitted both the genome and the feature table.
After completing the submission I received a mail stating there are internal stop codons and some other issues like this:
Some or all of the protein-coding sequences contain internal stop codons, reading frame shifts (insertions/deletions based on BLAST similarity search results and/or an alignment), and/or have translations that show little or no similarity to other proteins in the database.
Upon searching for a solution, I found an NCBI support center post that suggested doing an online blast search of each CDS. And to check the presence of internal stop codons using the "CDS feature" view option. For some sequences, it showed two different reading frames where one would possibly have a stop codon in the middle od the sequence.
An example is attached here:
In this case, if the reading frame is starting from the second base (T) the fourth codon will be a stop codon. Or else if it's starting from first it would normally have a stop codon at the end. Like wise I checked for a couple of CDS and got results like this.
Now I have the following doubts:
- What is the meaning of showing these different reading frames when a start codon is already there?
- If I have internal stop codons how should I deal with those?
- During submission in the feature table itself we will mention the genetic code applicable. In this case, it is 11, so would these multiple reading frames be considered during submission too?
Hope someone will help me with this.
Thank you.