trying to count a pattern in my fastq file using bbduk
0
0
Entering edit mode
8 months ago
Assa Yeroslaviz ★ 1.9k

I'm using bbduk to count a specific pattern in my fastq file. The pattern is quite long and contain three barcodes with spacers.

the command looks like that

$ bbduk.sh -Xmx100g in=10_ID_mRNA_S1_L002_R1_001.fastq outm=10_ID_mRNA_pattern_found.fastq literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN k=19 copyundefined mm=f

java -ea -Xmx100g -Xms100g -cp /fs/home/yeroslaviz/miniconda3/envs/bbmap/opt/bbmap-39.06-1/current/ jgi.BBDuk -Xmx100g in=10_ID_mRNA_S1_L002_R1_001.fastq outm=10_ID_mRNA_pattern_found.fastq literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN k=19 copyundefined mm=f
Executing jgi.BBDuk [-Xmx100g, in=10_ID_mRNA_S1_L002_R1_001.fastq, outm=10_ID_mRNA_pattern_found.fastq, literal=NNNNNNNNNNCAGCTACTGCNNNNNNNNNNCGAGTACCCTNNNNNNNNNN, k=19, copyundefined, mm=f]
Version 39.06

0.067 seconds.
Initial:
Memory: max=107374m, total=107374m, free=107272m, used=102m
...

But then I get this java heap space error

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOf(Arrays.java:3481)
        at java.base/java.util.ArrayList.toArray(ArrayList.java:370)
        at java.base/java.util.ArrayList.addAll(ArrayList.java:753)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.makeReplicates(Tools.java:1306)
        at shared.Tools.replicateAmbiguous(Tools.java:1261)
        at jgi.BBDuk.spawnLoadThreads(BBDuk.java:1851)
        at jgi.BBDuk.process2(BBDuk.java:1171)
        at jgi.BBDuk.process(BBDuk.java:1121)
        at jgi.BBDuk.main(BBDuk.java:81)

any idea how to overcome it?

thanks

Assa

java bbduk bbmap fastq • 448 views
ADD COMMENT
0
Entering edit mode

I don't think you can use N's in literal= since the match is literal as the option says. Depending where in the read this pattern is you may be able to run three independent operations to add up the read numbers.

I assume this is related to UMI-Tools knee-method has great influence on the results of white list ?

ADD REPLY
0
Entering edit mode

yes it does.

As far as I understand it, you can use N here. I took it from this example Brian posted a few years ago

This is why the degenerate parameter is used here.

Am I wrong thinking this way?

Is it possible, that this pattern is just too long for bbduk to handle?

ADD REPLY
0
Entering edit mode

Thanks for that link. As noted in that example, you can try seal.sh instead of bbduk.sh.

Is it possible, that this pattern is just too long for bbduk to handle?

With those many N's your will likely need a lot more RAM which is what is happening with original command line i.e. 100G is not enough.

ADD REPLY

Login before adding your answer.

Traffic: 3409 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6