JGI Seal ArrayIndexOutOfBoundsException
0
0
Entering edit mode
3.0 years ago
daewowo ▴ 80

I am using Seal for multi-genome filtering and get the following error:

Version 38.87

Set threads to 24
0.027 seconds.
Initial:
Memory: max=322122m, total=322122m, free=322038m, used=84m

Added 7745315649 kmers; time:   906.634 seconds.
Memory: max=322122m, total=322122m, free=114411m, used=207711m

Input is being processed as unpaired
Processing time:        0.407 seconds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 23438 out of bounds for length 23438
    at java.base/java.lang.invoke.VarHandle$1.apply(VarHandle.java:2187)
    at java.base/java.lang.invoke.VarHandle$1.apply(VarHandle.java:2184)
    at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:177)
    at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:174)
    at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:62)
    at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
    at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
    at java.base/java.lang.invoke.VarHandleLongs$Array.getVolatile(VarHandleLongs.java:768)
    at java.base/java.util.concurrent.atomic.AtomicLongArray.get(AtomicLongArray.java:95)
    at jgi.Seal.writeRefStats(Seal.java:959)
    at jgi.Seal.process2(Seal.java:736)
    at jgi.Seal.process(Seal.java:636)
    at jgi.Seal.main(Seal.java:70)

The command I am using to run is

~/apps/bbmap/seal.sh in=final.contigs.fa ref=GRCm39.fna,GRCh38.fna,Sscrofa11.1.fna,Vero.fna ambig=toss stats=toss_stats.txt refstats=toss_refstats.txt rpkm=toss_rpkm.txt nzo=t threads=24 -Xmx300g overwrite=true

I have run successfully using same command with 3 and 4 different genomes OK on different contigs. I have tried the prealloc=t and get same error. Any ideas on what the error could be due to? (still have 60GB ram spare).

alignment filtering Seal genome • 1.2k views
ADD COMMENT
1
Entering edit mode

So is the error specifically appearing for this particular file? One thing you can check is to make sure that you have no empty fasta entries.

BTW are you using this for binning reads or estimation of expression? You should ideally use salmon since seal.sh is not using any statistical inference to do this.

ADD REPLY
0
Entering edit mode

I used 3 of the references instead of 4 and Seal worked fine. I then re-ran changing the 3rd reference to the fourth file I was initallally using, and got same error. I found 13 duplicates in the problem reference and removed these, then re-ran with deduped reference and got an error

java.lang.Exception: 
An input file appears to be misformatted:
The character with ASCII code 49 appeared where a base was expected: '1'

I then re-ran Seal with 'tossjunk' parameter, but got the inital 'ArrayIndexOutOfBoundsException'.

The problem is badly formatted reference genome.

NB I am using Seal for read binning (these contigs are often longer than 600nt, otherwise would use BBSplit), and it's fast (order of magnitude faster) and gives comparable results (or better) to serial alignment to one genome after another or Disambiguate methods.

ADD REPLY

Login before adding your answer.

Traffic: 1139 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6