obitools
's obisplit
command will take a file of sequences and sort reads into separate files, which it will name according to some specified attribute of sequences. The command
obisplit -p DATA_ -t color input.fastq
will result in files DATA_red.fastq
, DATA_green.fastq
, DATA_blue.fastq
.
However, when I try to run this in parallel using parallel
package, the output is not files but it prints to the console.
What I do is I split input.fastq
into several files, e.g. input_01.fastq
and input_02.fastq
(using ngsutils
) and run them in parallel
find . * | grep -P "^input_\\d+" | parallel -j+2 obisplit -p DATA_processed_color_{/.}_ -t color {/}
{/.}
records from which file (01 or 02) the data is from and {/}
is input_* as captured by find
.
How can I convince parallel
to write to files instead of the console?
I'm not sure about your parallel command and/or obisplit, but I would try:
Although this might not have the exact output name you would want to have.
I have tried various ways of what you've suggested but still no dice. What I care about is that file must have the right designation as assigned by obisplit (e.g.
DATA_processed_color_bookeeping_*red*
), the rest is just bookkeeping which I can handle later on.