Entering edit mode
3.0 years ago
daewowo
▴
80
I am using Seal for multi-genome filtering and get the following error:
Version 38.87
Set threads to 24
0.027 seconds.
Initial:
Memory: max=322122m, total=322122m, free=322038m, used=84m
Added 7745315649 kmers; time: 906.634 seconds.
Memory: max=322122m, total=322122m, free=114411m, used=207711m
Input is being processed as unpaired
Processing time: 0.407 seconds.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: Index 23438 out of bounds for length 23438
at java.base/java.lang.invoke.VarHandle$1.apply(VarHandle.java:2187)
at java.base/java.lang.invoke.VarHandle$1.apply(VarHandle.java:2184)
at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:177)
at java.base/jdk.internal.util.Preconditions$1.apply(Preconditions.java:174)
at java.base/jdk.internal.util.Preconditions.outOfBounds(Preconditions.java:62)
at java.base/jdk.internal.util.Preconditions.outOfBoundsCheckIndex(Preconditions.java:70)
at java.base/jdk.internal.util.Preconditions.checkIndex(Preconditions.java:266)
at java.base/java.lang.invoke.VarHandleLongs$Array.getVolatile(VarHandleLongs.java:768)
at java.base/java.util.concurrent.atomic.AtomicLongArray.get(AtomicLongArray.java:95)
at jgi.Seal.writeRefStats(Seal.java:959)
at jgi.Seal.process2(Seal.java:736)
at jgi.Seal.process(Seal.java:636)
at jgi.Seal.main(Seal.java:70)
The command I am using to run is
~/apps/bbmap/seal.sh in=final.contigs.fa ref=GRCm39.fna,GRCh38.fna,Sscrofa11.1.fna,Vero.fna ambig=toss stats=toss_stats.txt refstats=toss_refstats.txt rpkm=toss_rpkm.txt nzo=t threads=24 -Xmx300g overwrite=true
I have run successfully using same command with 3 and 4 different genomes OK on different contigs. I have tried the prealloc=t and get same error. Any ideas on what the error could be due to? (still have 60GB ram spare).
So is the error specifically appearing for this particular file? One thing you can check is to make sure that you have no empty fasta entries.
BTW are you using this for binning reads or estimation of expression? You should ideally use
salmon
sinceseal.sh
is not using any statistical inference to do this.I used 3 of the references instead of 4 and Seal worked fine. I then re-ran changing the 3rd reference to the fourth file I was initallally using, and got same error. I found 13 duplicates in the problem reference and removed these, then re-ran with deduped reference and got an error
I then re-ran Seal with 'tossjunk' parameter, but got the inital 'ArrayIndexOutOfBoundsException'.
The problem is badly formatted reference genome.
NB I am using Seal for read binning (these contigs are often longer than 600nt, otherwise would use BBSplit), and it's fast (order of magnitude faster) and gives comparable results (or better) to serial alignment to one genome after another or Disambiguate methods.