Entering edit mode
4.7 years ago
storm1907
▴
30
Hello,
I have a problem with BGI RNA-Seq files. When doing Trimmomatics, my script stops exactly at sample 41 in each cell's first lane (V300042149_L01_41_1.fq.gz
) in IlluminaClip step. Requesting larger and faster resources from server (more memory, ppn or new nodes with big memory) does not make sense, so I decided just to skip that file.
#!/bin/bash -x
#PBS -N trimmomatics_job
#PBS -q batch
#PBS -l walltime=72:00:00
#PBS -l feature=summer
#PBS -l feature=largescratch
#PBS -l nodes=wn01:ppn=12;mem=60gb
#PBS -W x=naccesspolicy:UNIQUEUSER
#PBS -j oe
#PBS -A job
INPATH=/dir/dir/dir/subdir/subdir/subdir
OUTPATH=/dir/dir/dir/subdir/subdir/subdir
cd <path to Trimmomatics tool>
shopt -s nullglob
for dir in $INPATH{/,/*/} ;
do
for file in $dir/*1.fq.gz ;
do
bname=$(basename $file '1.fq.gz')
echo "file: "$file
echo $bname
input1=$dir/$bname"1.fq.gz"
input2=$dir/$bname"2.fq.gz"
output1=$OUTPATH/$bname"1.paired.fq.gz"
output2=$OUTPATH/$bname"1.unpaired.fq.gz"
output3=$OUTPATH/$bname"2.paired.fq.gz"
output4=$OUTPATH/$bname"2.unpaired.fq.gz"
echo $input1 $input2
find . -type f -name V300042149_L01_41_1.fq.gz -prune -o -exec trimmomatic-0.39.jar {} \;
java -jar trimmomatic-0.39.jar PE -threads 4 -phred33 \
"$input1" "$input2" "$output1" "$output2" "$output3" "$output4" \
ILLUMINACLIP:BGI_Adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
done
done
I have no idea, ho to write this row for excluding the file:
find . -type f -name V300042149_L01_41_1.fq.gz -prune -o -exec trimmomatic-0.39.jar {} \;
thank you :)
Nested loops in shell are a bad, bad choice. Surely there must be a better way, maybe create a table of input args first and then run a command per line of that file?
You should use something like
find
and-not -name "*_41_*"
to exclude the41
files, but definitely look at creating a tabular file with input1, input2, output1, output2, output3, output4 as columns then execute the command per line in that file. You can use the find I suggested while creating this tabular file so the tabular file excludes the41
files.Or, your file could just contain the
bname
values and you could construct everything else on the fly. Just try and get rid of the loop - it is shell abuse.Oh and by the way, it's
trimmomatic
, not "trimmomatics".Another point: You'll almost never need to
cd
to a tool directory to run it. Only badly built tools work that way. Instead use something likefrom your working directory so any files dumped in the working directory won't clog up the tool's source directory.