Hello everyone, I am hoping to get some feedbacks from the people who submitted annotated genome and get validation error report, and you might overwhelming amounts of messages, for instance:
FATAL: CONTAINED_CDS: 196 coding regions are completely contained in another coding region.
CDSmRNAXrefLocationProblem The CDS is not contained within the cross-referenced mRNA
lcl|scaffold_1:CDS hypothetical protein (lcl|scaffold_1:20313911-20313935, 20314012-20314376, 20314445-20314529, 20315469-20315574, 20315647-20315938) XX645_000728
lcl|scaffold_1:CDS hypothetical protein (lcl|scaffold_1:24258921-24259046, 24259119-24259542, 24264324-24264645, 24264716-24264919, 24272690-24273239) XX645_000926
lcl|scaffold_1:CDS hypothetical protein (lcl|scaffold_1:24258921-24259046, 24259119-24259542, 24264324-24264645, 24264716-24264919, 24269914-24270472) XX645_000926
lcl|scaffold_1:CDS hypothetical protein (lcl|scaffold_1:31237566-31237614, 31237674-31239238, 31241972-31242463) XX645_001229...
How do people resolve this issue? Like for example, do you re-do the CDS of prediction or blast it to examine the boundaries of CDS from homologous genes ? Do you all have any suggestion how to resolve this and resubmit the annotated genome?
All those suggestions are very help. AGAT seems like a very helpful tool to save some of time, but still it's better check up manually for all the problem genes. Thanks a lot! In addition to CDS regions, do you have any suggestion for tRNA prediction? I have one particular tRNA longer than 150 bp, do you have any suggestion for tool to check the tRNA region?
not particularly as tRNA 'prediction' is what it is .... perhaps there is an assembly issue on that locus causing it to artificially inflate the region length ?