Question

GeMoMa AnnotationFinalizer: java.lang.NumberFormatException: For input string: "7180000819398"

0

Entering edit mode

6.1 years ago

Rohith B S • 0

Issue with GeMoMa AnnotationFinalizer

I was running GeMoMa to predict genes/proteins and annotate my plant genome assembly (repeat masked). But the last step where AnnotationFinalizer module in GeMoMa throws the following error:

Error:

starting AnnotationFinalizer
java.lang.NumberFormatException: For input string: "7180000819398"
        at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
        at java.lang.Integer.parseInt(Integer.java:583)
        at java.lang.Integer.parseInt(Integer.java:615)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.extractInt(AnnotationFinalizer.java:410)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:400)
        at projects.gemoma.AnnotationFinalizer$SequenceIDComparator.compare(AnnotationFinalizer.java:1)
        at java.util.TimSort.countRunAndMakeAscending(TimSort.java:355)
        at java.util.TimSort.sort(TimSort.java:234)
        at java.util.Arrays.sort(Arrays.java:1438)
        at projects.gemoma.AnnotationFinalizer.run(AnnotationFinalizer.java:488)
        at projects.gemoma.GeMoMaPipeline$JAnnotationFinalizer.doJob(GeMoMaPipeline.java:1466)
        at projects.gemoma.GeMoMaPipeline$FlaggedRunnable.run(GeMoMaPipeline.java:917)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Resolution Attempts

I verified whether filtered_prediction.gff has an integer in the first column. but NO. I see jcf7180000819398.
I tried other tools like gffread to convert gff to gtf, hence to use get_sequence_from_gtf.pl from GeneMark to get the sequence.
Secondly, I tried getAnnoFasta.pl from Augustus (Partially works but annotations are nowhere available in the fasta, also no protein sequences).
Thirdly, played around with rtracklayer(failed) and bedtools(gave fasta but again unreliable).

Please help with some leads.
Thank you in advance.

genome annotation GeMoMa AnnotationFinalizer • 1.8k views

ADD COMMENT • link 6.1 years ago by Rohith B S • 0

0

Entering edit mode

Welcome to Biostars and thank you for the contribution! Please use the formatting bar (especially the code option) to present your post better. You can use backticks for inline code (`text` becomes text), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.
code_formatting

ADD REPLY • link 6.1 years ago by Ram 45k

1

Entering edit mode

I don't know this tool, but the problem is that it's trying to parse that value to a 32-bit integer, which has a maximum value of 2147483647.

ADD REPLY • link 6.1 years ago by tpoterba ▴ 50

0

Entering edit mode

Thank you. I will do the same from next time.

ADD REPLY • link 6.1 years ago by Rohith B S • 0

score 0 · Answer 1 · 2019-05-28

I found out from the developers that, the tool tries to sort the input based on the numeric value in the scaffolds/contigs while doing that they were typecasting to integers. Hence the issue was caused. They mentioned that this will be fixed in the next release.

We need to use the tools version above 1.6.0.

tpoterba Thank you for your help.