Question

phage genome submission in ncbi genebank

0

Entering edit mode

18 months ago

tahsinkhan570 • 0

I annotated a bacteriophage genome using prokka against PHROGS database, and used table2asn to create a .sqn file. But it gave error messages like

"SEQ_FEAT.GeneXrefWithoutGene, SEQ_FEAT.BadEcNumberFormat, SEQ_FEAT.BadProteinName"


Error: valid [SEQ_FEAT.GeneXrefWithoutGene] Feature has gene locus_tag cross-reference but no equivalent gene feature exists FEATURE: tRNA: Met [lcl|pilon_c1:35902-35976] [lcl|pilon_c1: raw, dna len= 148445]

Error: valid [SEQ_FEAT.BadEcNumberFormat] phrog_2162 is not in proper EC_number format FEATURE: Prot: Sir2 (NAD-dependent deacetylase) [lcl|pilon_c1_1:1-266] [lcl|pilon_c1_1: raw, aa len= 266]

Error: valid [SEQ_FEAT.BadProteinName] Unknown or hypothetical protein should not have EC number FEATURE: Prot: hypothetical protein [lcl|pilon_c1_2:1-163] [lcl|pilon_c1_2: raw, aa len= 163]

NCBI mentioned that errors must be corrected before GeneBank submission. Could anyone please let me know how to correct the errors?

Many thanks Khan

phage table2asn ncbi • 1.0k views

ADD COMMENT • link 18 months ago by tahsinkhan570 • 0

1

Entering edit mode

When running prokka, did you call the --compliant parameter to enforce Genbank compliance? https://github.com/tseemann/prokka#ncbi-genbank-submitter

ADD REPLY • link 18 months ago by acvill ▴ 350

0

Entering edit mode

Thanks for the info. I called the --compliant and later removed the Ec_number of hypothetical protein as mentioned by NCBI table2asn guideline. Now I am stuck with BadEcNumberFormat. Here is the error message

"Error: valid [SEQ_FEAT.BadEcNumberFormat] phrog_2162 is not in proper EC_number format FEATURE: Prot: Sir2 (NAD-dependent deacetylase) [gnl|Prokka|pilon_c1:1-266] [gnl|Prokka|ipilon_c1: raw, aa len= 266]"

I dont know if the .tbl format is ok or not.

881 81 gene
locus_tag pilon_c1_00001

881 81 CDS
EC_number phrog_2162 inference ab initio prediction:Prodigal:002006 locus_tagpilon_c1_00001 product Sir2 (NAD-dependent deacetylase) protein_id gnl|Prokka|pilon_c1_00001

Regards Khan

ADD REPLY • link 18 months ago by tahsinkhan570 • 0