Hello, having issue with Paired End Trimming. It looks like my script does not recognize input files, and I miss any idea, what is wrong.
#!/bin/bash
#PBS -N trimmomatics_job
#PBS -q batch
#PBS -l walltime=72:00:00
#PBS -l nodes=1:ppn=40,mem=40gb
#PBS -W x=naccesspolicy:UNIQUEUSER
#PBS -j oe
#PBS -A job
module load java
INPATH1=/home/groups/dir/subdir
OUTPATH=/home/groups/dir/subdir
cd $INPATH1
for dir in */;
do
for file1 in $dir/*.fq.gz;
do
bname1=$(basename $file1 '.fq.gz')
sample1="$( cut -d'_' -f 1,2,3<<<"$bname1")"
read1="$( cut -d'_' -f 4 <<<"$bname1")"
for file2 in $dir/*.fq.gz;
do
bname2=$(basename $file2 '.fq.gz')
sample2="$( cut -d'_' -f 1,2,3<<<"$bname2")"
read2="$( cut -d'_' -f 4 <<<"$bname2")"
if [ "$sample1" == "$sample2" ] && [ "$read1" != "$read2" ] \
&& [ "$read1" == 1 ] ;
then
echo "$sample2" "$sample2" "$read1" "$read2"
input1=$INPATH1/$bname1.fq.gz
input2=$INPATH2/$bname2.fq.gz
output1=$OUTPATH/$bname1.paired.fq.gz
output2=$OUTPATH/$bname1.unpaired.fq.gz
output3=$OUTPATH/$bname2.paired.fq.gz
output4=$OUTPATH/$bname2.unpaired.fq.gz
echo "$input1" "$input2"
cd /home/usr/tools/Trimmomatic-0.39
java -jar trimmomatic-0.39.jar PE -phred33 \
"$input1" "$input2" "$output1" "$output2" "$output3" "$output4" \
ILLUMINACLIP:BGI_Adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
fi
done
done
done
however, I keep getting error message like this
Exception in thread "main" java.io.FileNotFoundException: /dir/subdir/_L01_100_1.fq.gz (No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.<init>(FileInputStream.java:138)
at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:265)
at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
Error is clear:
Looking at your script you do need to change the values for these two variables to match real folders you have on your server.
Well, I especially changed the names of folders (data confidentiality and stuff like that :D )
I see. The first part suspiciously matched the error you showed so just wanted to be certain.
Next thing to check is your file names. Are they consistent? Do they end in
.fq.gz
and have_L01_100_1
in name? That naming is a bit odd since files generally should have_L001_001.fq.gz
in their names. Put a number ofecho
commands in your script and see what is produced at each step to debug the issue.these are BGI RNA Seq files, for example, V40002080_L01_109_1.fq.gz or V40002080_L01_109_2.fq.gz
So some combination of
dir/subdir
and_L01_100_1.fq.gz
is not being found, because the sample name is not getting reconstituted properly.Would it not be better to submit multiple jobs inside
for
loop instead of submitting a single job like this?I was thinking about it, but how can i put two variables (input1 and 2) into one loop?