Hello,
I'm trying to generate to simulated PE reads from the wheat reference genome using BBTools randomreads.sh and I keep receiving the following error message within a few seconds:
Exception in thread "main" java.lang.ArithmeticException: / by zero
at align2.RandomReads3.fillRandomChrom(RandomReads3.java:1823)
at align2.RandomReads3.<init>(RandomReads3.java:606)
at align2.RandomReads3.<init>(RandomReads3.java:599)
at align2.RandomReads3.main(RandomReads3.java:409)
The commands I am using are:
randomreads.sh ref=refgenome.fa.gz out1=gen_R1.fq out2=gen_R2.fq length=100 reads=2000 paired=t
Interestingly, the error appears to only affect the wheat genome. I've tried with both the full reference and several subgenomic sequences and receive the same error. When I try the same commands with tobacco and coffee genomes the program works beautifully. I've inspected the head of the reference genome files from all genomes and I see no differences in their format.
Can anyone provide any insight into this interesting error?
Could the chromosomes be too long?
Ooh, excellent point. I'll go ahead and check the differences in sequence lengths between the files and get back to you. Any idea why long chromosomes might cause this error (just out of curiosity).
Regarding the sequence lengths, it appears you could be on to something. Both coffee's and tobacco's reference genomes' contigs are far smaller with tobaccos being on the scale of a few hundred thousand, coffees being on the scale of a few tens of millions and wheats being on the scale of hundreds of millions.
Allocate more memory to
randomreads.sh
with option-Xmx40g
and see if that fixes this. You may need to add tens of gigabytes depending on size of genome but I would say start with 30-40G.Thank you, I'll give this a go. I'm using a shared system so I might have to queue a while, but I will get back to you with the result.
Hi GenoMax, apologies for the delay, I performed a few iterations with increasing memory allocation. Unfortunately I still get the error within seconds of the script running despite allocating up to 120Gb. Do you have any other suggestions?
Where did you get your wheat genomes from? I may download a copy and give this a try.
I got them from ensemble. I'd love to hear what happens if you do get the chance.
I can confirm that I get the same error. I tried to build an index with
bbmap.sh
but that also generated a different error even after assigning 100G of RAM.I have written to @Brian (author of BBMap). If he responds, I will post an update. He may respond directly here as well.
Thank you so much GenoMax!
Please use the formatting bar (especially the
code
option) to present your post better. You can use backticks for inline code (`text` becomestext
), or select a chunk of text and use the highlighted button to format it as a code block. I've done it for you this time.Note: Blockquotes are used to quote text, not show commands/code.
Thank you, will do in future.
install
seqkit
the run:to get a sense of what the layout of your reference is across the chromosomes.
I suspect that some chromosomes might be too short and the random selection cannot operate correctly.
Thank you, I'll run it for all the genomes and see what I get but strangely it runs beautifully on the genomes with short, yet highly abundant contigs. The tobacco genome I use has 168247 relatively short contigs and the program doesn't produce the error but it does for wheats 22 relatively long contigs, and for wheats' well-assembled 1A chromosome.