How do I use GNU parallel with two inputs?
2
1
Entering edit mode
19 months ago
fb143 ▴ 10

I am using GNU parallel on the following shortbred script. The script is supposed to take two input files and output one .tsv file but I am getting both _1.tsv and _2.tsv files and directories as an output. How do I get the output something like :

SRR059331_dir

SRR059331.tsv

SRR059339_dir

SRR059339.tsv

I did bash for loop and it outputs the above expected results but it is very time consuming. How do I convert this bash script into parallel. This is my bash script:

for i in test_files/*_1.fastq.gz; do
    F=`basename $i _1.fastq.gz`;
    mkdir test_out/"$F"_dir;
    python shortbred/shortbred_quantify.py --markers markers.faa --wgs "$F"_1.fastq.gz "$F"_2.fastq.gz --results test_out/"$F".tsv --tmp test_out/"$F"_dir --usearch ./shortbred/usearch;

done

This is my script so far for GNU parallel

#!/bin/bash

#Create a sub-directory for ShortBRED output
mkdir test_out

time parallel -j 10 \
"python shortbred/shortbred_quantify.py \
--markers markers.faa \
--wgs {1} {2} \
--results test_out/{1/.}.tsv \
--tmp test_out/{1/.}_dir \
--usearch shortbred/usearch" ::: test_files/fastq/*_1.fastq.gz :::+ test_files/fastq/*_2.fastq.gz
parallel GNU • 2.3k views
ADD COMMENT
0
Entering edit mode

Someone will provide an exact answer but see if answer here helps in meantime: GNU parallel command with several multiple arguments

As noted always try --dry-run to see what parallel will use.

ADD REPLY
0
Entering edit mode

Have you determined what the time consuming part of the process is? Spawning multiple instances of your python process may not be as economical as simply increasing the threads that usearch is using for example.

ADD REPLY
5
Entering edit mode
19 months ago

See the parallel manual on input sources:

https://www.gnu.org/software/parallel/parallel_tutorial.html#input-sources

for example:

parallel --link echo {1} and {2} ::: A B C ::: D E F

will print:

A and D
B and E
C and F
ADD COMMENT
0
Entering edit mode

I think parallel --link is deprecated. I tried different methods but didn't work. I ended of concatenating to a single file and passed a single argument. Thank you both for your help.

ADD REPLY
1
Entering edit mode

--link is fully supported, but what you may be thinking of is :::+ which links two inputs, and which is newer. --link will link all inputs and is thus less flexible.

Compare:

parallel --link echo ::: a b c ::: d e f ::: g h i
parallel echo ::: a b c :::+ d e f ::: g h i
ADD REPLY
0
Entering edit mode

What makes you think that the feature is deprecated?

It is listed on the help page I linked above (and below)

https://www.gnu.org/software/parallel/parallel_tutorial.html#input-sources

it is an essential feature and very handy at that - seems unlikely that it would be removed

ADD REPLY
3
Entering edit mode
19 months ago
ole.tange ★ 4.5k

For readability I would always use a function:

doit() {
  python shortbred/shortbred_quantify.py \
    --markers markers.faa \
    --wgs "$1" "$2" \
    --results test_out/"$3".tsv \
    --tmp test_out/"$3"_dir \
    --usearch shortbred/usearch
}
export -f doit

time parallel -j10 --plus doit {} {/_1.fastq/_2.fastq} {/.} ::: test_files/fastq/*_1.fastq.gz
ADD COMMENT

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6