Entering edit mode
18 months ago
tahsinkhan570
•
0
I annotated a bacteriophage genome using prokka against PHROGS database, and used table2asn to create a .sqn file. But it gave error messages like
"SEQ_FEAT.GeneXrefWithoutGene, SEQ_FEAT.BadEcNumberFormat, SEQ_FEAT.BadProteinName"
Error: valid [SEQ_FEAT.GeneXrefWithoutGene] Feature has gene locus_tag cross-reference but no equivalent gene feature exists FEATURE: tRNA: Met [lcl|pilon_c1:35902-35976] [lcl|pilon_c1: raw, dna len= 148445]
Error: valid [SEQ_FEAT.BadEcNumberFormat] phrog_2162 is not in proper EC_number format FEATURE: Prot: Sir2 (NAD-dependent deacetylase) [lcl|pilon_c1_1:1-266] [lcl|pilon_c1_1: raw, aa len= 266]
Error: valid [SEQ_FEAT.BadProteinName] Unknown or hypothetical protein should not have EC number FEATURE: Prot: hypothetical protein [lcl|pilon_c1_2:1-163] [lcl|pilon_c1_2: raw, aa len= 163]
NCBI mentioned that errors must be corrected before GeneBank submission. Could anyone please let me know how to correct the errors?
Many thanks Khan
When running prokka, did you call the
--compliant
parameter to enforce Genbank compliance? https://github.com/tseemann/prokka#ncbi-genbank-submitterThanks for the info. I called the --compliant and later removed the Ec_number of hypothetical protein as mentioned by NCBI table2asn guideline. Now I am stuck with BadEcNumberFormat. Here is the error message
"Error: valid [SEQ_FEAT.BadEcNumberFormat] phrog_2162 is not in proper EC_number format FEATURE: Prot: Sir2 (NAD-dependent deacetylase) [gnl|Prokka|pilon_c1:1-266] [gnl|Prokka|ipilon_c1: raw, aa len= 266]"
I dont know if the .tbl format is ok or not.
881 81 gene
locus_tag pilon_c1_00001
881 81 CDS
EC_number phrog_2162 inference ab initio prediction:Prodigal:002006 locus_tagpilon_c1_00001 product Sir2 (NAD-dependent deacetylase) protein_id gnl|Prokka|pilon_c1_00001
Regards Khan