Question

Genbank Bacteriophage Genome Submission - Internal stop codons in CDS sequences

0

Entering edit mode

9 months ago

Sowmya Pulapet ▴ 70

Hi,

I am having trouble submitting a bacteriophage genome in Genbank. I have completed the assembly (Unicycler) and annotation (Rast) of the assembled genome. For GenBank submission, I created the feature table in the required format and submitted both the genome and the feature table.

After completing the submission I received a mail stating there are internal stop codons and some other issues like this:

Some or all of the protein-coding sequences contain internal stop codons, reading frame shifts (insertions/deletions based on BLAST similarity search results and/or an alignment), and/or have translations that show little or no similarity to other proteins in the database.

Upon searching for a solution, I found an NCBI support center post that suggested doing an online blast search of each CDS. And to check the presence of internal stop codons using the "CDS feature" view option. For some sequences, it showed two different reading frames where one would possibly have a stop codon in the middle od the sequence.

An example is attached here:

enter image description here

In this case, if the reading frame is starting from the second base (T) the fourth codon will be a stop codon. Or else if it's starting from first it would normally have a stop codon at the end. Like wise I checked for a couple of CDS and got results like this.

Now I have the following doubts:

What is the meaning of showing these different reading frames when a start codon is already there?
If I have internal stop codons how should I deal with those?
During submission in the feature table itself we will mention the genetic code applicable. In this case, it is 11, so would these multiple reading frames be considered during submission too?

Hope someone will help me with this.

Thank you.

CDS Genome NCBI Genbank • 857 views

ADD COMMENT • link updated 8 months ago by Ram 44k • written 9 months ago by Sowmya Pulapet ▴ 70

score 0 · Answer 1 · 2024-03-21

You might add the attribute pseudogene, then the pipeline will accept the internal stop codon but I do not think it is a good way to fix this. In AGAT there is a script that redefine your CDS change frame to create the longest CDS without internal stop. A last solution is to add « ribosome slippage » attribute to specify a change of reading frame in the middle of your sequence.