Entering edit mode
8 months ago
Assa Yeroslaviz
★
1.9k
I'm using bbduk
to count a specific pattern in my fastq file. The pattern is quite long and contain three barcodes with spacers.
the command looks like that
$ bbduk.sh -Xmx100g in=10_ID_mRNA_S1_L002_R1_001.fastq outm=10_ID_mRNA_pattern_found.fastq literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN k=19 copyundefined mm=f
java -ea -Xmx100g -Xms100g -cp /fs/home/yeroslaviz/miniconda3/envs/bbmap/opt/bbmap-39.06-1/current/ jgi.BBDuk -Xmx100g in=10_ID_mRNA_S1_L002_R1_001.fastq outm=10_ID_mRNA_pattern_found.fastq literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN k=19 copyundefined mm=f
Executing jgi.BBDuk [-Xmx100g, in=10_ID_mRNA_S1_L002_R1_001.fastq, outm=10_ID_mRNA_pattern_found.fastq, literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN, k=19, copyundefined, mm=f]
Version 39.06
0.067 seconds.
Initial:
Memory: max=107374m, total=107374m, free=107272m, used=102m
...
But then I get this java heap space
error
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
at java.base/java.util.Arrays.copyOf(Arrays.java:3481)
at java.base/java.util.ArrayList.toArray(ArrayList.java:370)
at java.base/java.util.ArrayList.addAll(ArrayList.java:753)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.makeReplicates(Tools.java:1306)
at shared.Tools.replicateAmbiguous(Tools.java:1261)
at jgi.BBDuk.spawnLoadThreads(BBDuk.java:1851)
at jgi.BBDuk.process2(BBDuk.java:1171)
at jgi.BBDuk.process(BBDuk.java:1121)
at jgi.BBDuk.main(BBDuk.java:81)
any idea how to overcome it?
thanks
Assa
I don't think you can use N's in
literal=
since the match is literal as the option says. Depending where in the read this pattern is you may be able to run three independent operations to add up the read numbers.I assume this is related to UMI-Tools knee-method has great influence on the results of white list ?
yes it does.
As far as I understand it, you can use
N
here. I took it from this example Brian posted a few years agoThis is why the
degenerate
parameter is used here.Am I wrong thinking this way?
Is it possible, that this pattern is just too long for
bbduk
to handle?Thanks for that link. As noted in that example, you can try
seal.sh
instead ofbbduk.sh
.With those many N's your will likely need a lot more RAM which is what is happening with original command line i.e. 100G is not enough.