How to use BBtools repair.sh on multiple files
1
0
Entering edit mode
7.1 years ago
dw845316 ▴ 20

Hello Friends,

I have have 30 paired forward and reverse fastq files that I need to run BBtools repair.sh on. As of now I can only run one pair at a time. Is there a way to run repair.sh on each pair without me having to do it individually?

bash repair.sh in1=DW2_S12_L001_R1_001.fastq_16S.fastq in2=DW2_S12_L001_R2_001.fastq_16S.fastq out1=fixed1.fastq out2=fixed2.fastq outs=singleone.fastq repair

Cheers,
Danny

software-error next-gen-sequencing • 5.6k views
ADD COMMENT
1
Entering edit mode

No issues with your way of execution. Since repair on one set of files (R1 and R2 of one sample) are independent of another set (R1 and R2 of another sample), this work can be parallelized. Try parallel in linux:

$  parallel --plus echo {%R._001.fastq_16S.fastq}  ::: *.fastq  | parallel 'repair.sh in1={}R1_001.fastq_16S.fastq in2={}R2_001.fastq_16S.fastq out1={}fixed_R1_001.fastq_16S.fastq out2={}fixed_R2_001.fastq_16S.fastq outs={}single.fastq repair'

(assuming that all your files have (R1 and R2) have same extensions: R1_001.fastq_16S.fastq and R2_001.fastq_16S.fastq after sample names and parallel version is GNU parallel 20171022)

ADD REPLY
0
Entering edit mode

I have file names in the following format: HB-7R-25L0.clean.R1.fastq.gz, HB-7R-25L0.clean.R2.fastq.gz I tried to use parallel, as you suggested but dynamic replacement is not happening.

parallel --plus echo {%R..fastq.gz}  ::: *.fastq.gz

This echos the entire filename and does not capture the sample name HB-7R-25L0.clean, hence it does not pass to the next step as desired. Instead it replaces with in1=HB-7R-25L0.clean.R1.fastq.gz.R1.fastq.gz, in2=HB-7R-25L0.clean.R2.fastq.gz.R2.fastq.gz

ADD REPLY
1
Entering edit mode

It would help

  1. If you can open a new post on this
  2. If you can post example input file names for R1 and R2
  3. Parallel version in use

you have an issue with the command: parallel --plus echo {%R..fastq.gz} ::: *.fastq.gz (extra dot before R is necessary)

$ ls *.gz
HB-7R-25L0.clean.R1.fastq.gz  HB-7R-25L0.clean.R2.fastq.gz

$ parallel --plus echo {} {=s/R1/R2/=} {%.R1.fastq.gz} ::: *R1.fastq.gz
HB-7R-25L0.clean.R1.fastq.gz HB-7R-25L0.clean.R2.fastq.gz HB-7R-25L0.clean

$ parallel --plus echo {%.R..fastq.gz} ::: *.fastq.gz
HB-7R-25L0.clean
HB-7R-25L0.clean

$ parallel --plus echo {%.R..fastq.gz} ::: *R1.fastq.gz
HB-7R-25L0.clean

I would suggest digit instead of dot after R as following:

$ parallel --plus echo {%.R\[1-2\].fastq.gz} ::: *.fastq.gz
HB-7R-25L0.clean
HB-7R-25L0.clean

$ parallel --plus echo {%.R1.fastq.gz} ::: *R1.fastq.gz
HB-7R-25L0.clean

is this what you are trying to do?

$ parallel --plus --dry-run  'repair.sh in1={} in2={=s/R1/R2/=} out1={%.R1.fastq.gz}.fixed.R1.fastq.gz out2={%.R1.fastq.gz}.fixed.R2.fastq.gz outs={%.R1.fastq.gz}_singletons.fastq repair' ::: *R1.fastq.gz

repair.sh in1=HB-7R-25L0.clean.R1.fastq.gz in2=HB-7R-25L0.clean.R2.fastq.gz out1=HB-7R-25L0.clean.fixed.R1.fastq.gz out2=HB-7R-25L0.clean.fixed.R2.fastq.gz outs=HB-7R-25L0.clean_singletons.fastq repair
ADD REPLY
0
Entering edit mode

Thanks. The issue is not resolved. Opening a new post now with more details.

ADD REPLY
0
Entering edit mode

Hi @cpad0112. Please take a look at the question here Dynamic string replacement issue in parallel for repair.sh bbmap

ADD REPLY
0
Entering edit mode

Yeah ! :-). Now it is clear, where the mistake is. I have been trying to feed the first echo{} values and it feeds the entire output.

ADD REPLY
3
Entering edit mode
7.1 years ago
GenoMax 147k

Use a for loop. Something like this should work:

for i in `ls -1 *R1*.fastq | sed 's/_R1_001.fastq_16S.fastq//'`; do repair.sh in1=$i\_R1_001.fastq_16S.fastq in2=$i\_R2_001.fastq_16S.fastq out1=$i\_fixed_R1_001.fastq_16S.fastq out2=$i\_fixed_R2_001.fastq_16S.fastq outs=$i\_single.fastq repair; done
ADD COMMENT

Login before adding your answer.

Traffic: 2366 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6