How can I input reads from a file descriptor rather than an actual text file? I have created four small test FASTQ files to identify the problem; two R1 files and two R2 files. If I use process substitution, Clumpify fails. If I firstly combine R1 reads into a text file on disk and R2 reads into another file and use them, then Clumpify works. The reason I am exploring this is because I have a data set of 51 whole genome DNA samples. 47 of them were sequenced in one lane and four of them were sequenced in two lanes. The command I used is
clumpify.sh -Xmx1g t=1 in1=<(cat ${R1[@]}) in2=<(cat ${R2[@]}) out1=$R1output out2=$R2output dedupe subs=1
This results in the error
Starting cris 0.
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 1
The only difference for the command which works successfully is in1=merged1.fastq
and in2=merged2.fastq
Done!
Time: 0.971 seconds.
Reads Processed: 4000 4.12k reads/sec
Bases Processed: 600k 0.62m bases/sec
I got this idea from the STAR RNA-seq aligner, which permits --readFilesIn <(gunzip -c reads.fastq.gz)
, for example.
On the university's computing cluster, version 37.98 of bbmap is installed. I could use the /scratch/
directory if all else fails.
Have you tried to do this using the java command directly instead of the shell wrapper (the
.sh
version you are using)?After your suggestion, I did. However, the error remains the same.
again produces
I suspect that clumpify is reopening the files internally, which won't work with a named pipe.
I don't really understand why you want to do so? There is no reason to use
cat
in that syntax.If there are simply files names in
${R1[@]}
, then simply use them as value forin1
, even if they are gzipped (since BBmap accepts gzip files as in put as well).It looks like OP is trying to merge several files on the fly and pass them to clumpify.
Try using the syntax
“${R1[*]}”
instead (including quotes).