I have tried to pipe output from the find command into parallel but I can't seem to figure out why it is not working as I expected.
Basically, I have three types of files in a directory i.e:
foo_1_dup.bam
foo_1_dup.bam.bai
foo_1_dup_recal_report
I want to use find to get *dup*
but exclude *bai
and pipe the output into parallel such that foo_1_dup.bam
will be {1}
and foo_1_dup_recal_report
will be {2}
.
I have many files such that each foo_[1..N]_dup.bam
has its corresponding foo_[1..N]_dup_recal_report
.
Here is the code I have tried:
find /dir/ \( -name \*dup\* ! -name \*bai\* \) | parallel --dryrun -N2 -j2 -k -v --progress --joblog recalibration_joblog --retries 2 --noswap "java -Xmx4g -jar GenomeAnalysisTK.jar -nct 4 -I {1} -BQSR {2} -R hg19.fasta -T PrintReads -o {1.}_recal.bam"
It works fine for the first round but then starts swapping {2}
for {1}
.
I have used find and piped output to parallel before and it worked fine but then I had only two types of files in the directory so I didn't have to exclude one. Could you please tell me what I am doing wrong or a better way to do it? Thanks in advance!
Please write a shell script, if you want to repeat this kind of work. You can work easily with shell scripts rather then writing a complicated one-liner. I will add a model script for your analysis as an answer, please write a script in your style after that.