NCBI genome submission validation error
1
0
Entering edit mode
10 weeks ago
slin023 ▴ 20

Hello everyone, I am hoping to get some feedbacks from the people who submitted annotated genome and get validation error report, and you might overwhelming amounts of messages, for instance:

FATAL: CONTAINED_CDS: 196 coding regions are completely contained in another coding region.

CDSmRNAXrefLocationProblem      The CDS is not contained within the cross-referenced mRNA
        lcl|scaffold_1:CDS       hypothetical protein   (lcl|scaffold_1:20313911-20313935, 20314012-20314376, 20314445-20314529, 20315469-20315574, 20315647-20315938)  XX645_000728
        lcl|scaffold_1:CDS       hypothetical protein   (lcl|scaffold_1:24258921-24259046, 24259119-24259542, 24264324-24264645, 24264716-24264919, 24272690-24273239)  XX645_000926
        lcl|scaffold_1:CDS       hypothetical protein   (lcl|scaffold_1:24258921-24259046, 24259119-24259542, 24264324-24264645, 24264716-24264919, 24269914-24270472)  XX645_000926
        lcl|scaffold_1:CDS       hypothetical protein   (lcl|scaffold_1:31237566-31237614, 31237674-31239238, 31241972-31242463)        XX645_001229...

How do people resolve this issue? Like for example, do you re-do the CDS of prediction or blast it to examine the boundaries of CDS from homologous genes ? Do you all have any suggestion how to resolve this and resubmit the annotated genome?

genome annotation • 498 views
ADD COMMENT
1
Entering edit mode
10 weeks ago

In general:

if there are not too many issues: dig in manual and resolve them. Open the genome +annotation file in a genome browser/editor (artemis, genomeview, ...) and fix the issue. If there are many : redo the genome annotation process and/or post-processing of the results. Alternatively you can consider to run a tool like AGAT which will try to resolve as much as possible issue for you. The issues it might not be able to resolve you will have to go for option 1 above: manual fixing.

more specific:

  • it seems you have CDS present that are completely contained within another CDS == you will likely have to remove the shorter one?
  • the second is a bit strange: it looks like you have an mRNA defined and the corresponding CDS has deviating coordinates (== it is thus not completely covered in the mRNA exons)

Inspecting the reported regions in a genome browser will shed light on what is going on exactly

ADD COMMENT
0
Entering edit mode

All those suggestions are very help. AGAT seems like a very helpful tool to save some of time, but still it's better check up manually for all the problem genes. Thanks a lot! In addition to CDS regions, do you have any suggestion for tRNA prediction? I have one particular tRNA longer than 150 bp, do you have any suggestion for tool to check the tRNA region?

FIND_BADLEN_TRNAS: 1 tRNA is too long - over 150 nucleotides

ORIG/out_JBAT.FINAL.sqn:tRNA    Tyr lcl|scaffold_1:32659796-32659945    XX645_001314
ADD REPLY
1
Entering edit mode

not particularly as tRNA 'prediction' is what it is .... perhaps there is an assembly issue on that locus causing it to artificially inflate the region length ?

ADD REPLY

Login before adding your answer.

Traffic: 1661 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6