Question

Error when running oma-standalone v1.0.6: Error, (in RangeOfChunk_iterator) sequence 2 is too long

0

Entering edit mode

8.5 years ago

javier.herrero ▴ 10

Dear all

I am trying to run OMA standalone on a set of primate genomes, using DNA sequences instead of AA. I am running OMA v1.0.6.

Some/many of my jobs are failing with the following error message (I am using -d 3 to get more information):

only_run_allall := true
only_run_dbconv := false
# SetRandSeed: SetRand(261301471):
Starting database conversion and checks...
Process 23089 on node-u04a-022: job nr 7 of 500
job 7 [pid 23089]: conversion done; waited for 0 sec
[pid  23089]: Computing gorilla vs human (Part 1281 of 2384). Mem: 0.288GB
   [pid  23089]: 5.00% complete, time left for this part=0.08h, 11.2% of AllAll done. Mem: 0.288GB
Error, (in RangeOfChunk_iterator) sequence 2 is too long
        executing statement: iterate(FullInd2Tuple(index,GS[name2,TotEntries]))
        locals defined as: name1 = gorilla, name2 = human, chunk = 1281, 
totChunks = 2384, all = 2383042332, first = 1279485816, last = 1280485414, index
 = 1279547847
        RangeOfChunk_iterator called with arguments: RangeOfChunk(gorilla,human,
1281)

Is this because some of the sequences are too long? If that is the case, what is the limit? Otherwise, what could I do to fix this issue?

Thanks

oma orthologs • 2.1k views

ADD COMMENT • link updated 8.5 years ago by Adrian Altenhoff ★ 1.1k • written 8.5 years ago by javier.herrero ▴ 10

0

Entering edit mode

Further to my previous question, I have been playing with a toy dataset. Indeed, increasing one of the sequences artificially beyond 100,000 bp seems to trigger this error.

Is there any workaround or any other recommended solution apart from either truncating or completely skipping this sequence?

In case you haven't guessed it, this error is triggered by the titin gene.

Thanks again

ADD REPLY • link 8.5 years ago by javier.herrero ▴ 10

score 1 · Accepted Answer · 2016-06-05

1

Entering edit mode

8.5 years ago

Adrian Altenhoff ★ 1.1k

Hi Javier,

yes, there is a hard limit of currently slightly over 100k AA. The value of the constant doesn't really matter too much. For the next release, I will increase this number to 200k. Do you think that would be enough? In the meanwhile, you can either skip these very long sequences, truncate them or I can send you a alternative binary directly.

Best wishes Adrian

ADD COMMENT • link 8.5 years ago by Adrian Altenhoff ★ 1.1k

0

Entering edit mode

Hi Adrian

Thank you for confirming. 200k should do the trick in my case.

I was thinking that in addition to increase the limit, you could filter out long sequences at the DB building stage just as you filter short ones. This way that error would not be triggered.

Regards, Javier

ADD REPLY • link 8.5 years ago by javier.herrero ▴ 10

0

Entering edit mode

yes, that is a good point. thanks, will include!

ADD REPLY • link 8.5 years ago by Adrian Altenhoff ★ 1.1k