Issue with GeMoMa AnnotationFinalizer
I was running GeMoMa to predict genes/proteins and annotate my plant genome assembly (repeat masked). But the last step where AnnotationFinalizer module in GeMoMa throws the following error:
Error:
starting AnnotationFinalizer
java.lang.NumberFormatException: For input string: "7180000819398"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Integer.parseInt(Integer.java:583)
at java.lang.Integer.parseInt(Integer.java:615)
at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.extractInt(AnnotationFinalizer.java:410)
at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:400)
at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:1)
at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
at java.util.TimSort.sort(TimSort.java:234)
at java.util.Arrays.sort(Arrays.java:1438)
at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:488)
at projects.gemoma.GeMoMaPipeline$JAnnotationFinalizer.doJob(GeMoMaPipeline.java:1466)
at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:917)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Resolution Attempts
I verified whether
filtered_prediction.gff
has an integer in the first column. but NO. I seejcf7180000819398
.I tried other tools like
gffread
to convert gff to gtf, hence to useget_sequence_from_gtf.pl
from GeneMark to get the sequence.Secondly, I tried
getAnnoFasta.pl
from Augustus (Partially works but annotations are nowhere available in the fasta, also no protein sequences).Thirdly, played around with
rtracklayer
(failed) andbedtools
(gave fasta but again unreliable).
Please help with some leads.
Thank you in advance.
Welcome to Biostars and thank you for the contribution! Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.I don't know this tool, but the problem is that it's trying to parse that value to a 32-bit integer, which has a maximum value of 2147483647.
Thank you. I will do the same from next time.