Entering edit mode
5.9 years ago
mb2subi
▴
10
Hi, I am trying to re-synchronize a set of fq paired-end reads using repair.sh but the following error arises:
Set INTERLEAVED to false
Started output stream.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.util.regex.Pattern.compile(Pattern.java:1718)
at java.util.regex.Pattern.<init>(Pattern.java:1351)
at java.util.regex.Pattern.compile(Pattern.java:1028)
at java.lang.String.split(String.java:2380)
at java.lang.String.split(String.java:2422)
at jgi.SplitPairsAndSingles.repair(SplitPairsAndSingles.java:723)
at jgi.SplitPairsAndSingles.process3_repair(SplitPairsAndSingles.java:549)
at jgi.SplitPairsAndSingles.process2(SplitPairsAndSingles.java:311)
at jgi.SplitPairsAndSingles.process(SplitPairsAndSingles.java:237)
at jgi.SplitPairsAndSingles.main(SplitPairsAndSingles.java:46)
The files are 40G each.
I tried to solve it using the parameter -Xmx with 2g, 10g, 20g, 80g and 300g and it didn't work. Using this parameter I got bigger output files from 1.5G to 5.9G but the software finally crashed.
I used another set of data (1g each) and it worked correctly.
With large files (I assume compressed) like that there is not much you can do but perhaps split the files up and run them multiple times? Repair.sh will need large amounts of memory since it needs to keep a lot of information available.
BTW: How did the files get out of sync in first place? If that happened because of trimming them independently then I would go back and redo the trimming with paired files.
The facility sent it out of sync
Then ask them to provide original data files and do the trimming yourself (I assume that is what has caused the out-of-sync issue?).
The point is that won't be easy to ask them, for this reason first I'm trying to resolve by myself, if it's not possible the last option will be to ask them.
If this is your data then you have every right to get a copy of the original. That said if 300G is the max memory you have available (and it did not work) so you could try to split the original files up into 2+ pieces and try to see what max size works. You will have to do some bookkeeping to make sure you get all reads (and no duplicates) in the end.
Thanks. By the way, isn't possible to do something with the heap space?
How can split properly the fq.gz for doing the resynchronization?