problems with trimmomatics PE
1
0
Entering edit mode
4.3 years ago
storm1907 ▴ 30

Hello, having issue with Paired End Trimming. It looks like my script does not recognize input files, and I miss any idea, what is wrong.

#!/bin/bash
#PBS -N trimmomatics_job
#PBS -q batch
#PBS -l walltime=72:00:00
#PBS -l nodes=1:ppn=40,mem=40gb
#PBS -W x=naccesspolicy:UNIQUEUSER
#PBS -j oe
#PBS -A job

module load java
INPATH1=/home/groups/dir/subdir
OUTPATH=/home/groups/dir/subdir
cd $INPATH1

for dir in */;
do
        for file1 in $dir/*.fq.gz;
        do
                bname1=$(basename $file1 '.fq.gz')
                sample1="$( cut -d'_' -f 1,2,3<<<"$bname1")"
                read1="$( cut -d'_' -f 4 <<<"$bname1")"
                for file2 in $dir/*.fq.gz;
                do
                        bname2=$(basename $file2 '.fq.gz')
                        sample2="$( cut -d'_' -f 1,2,3<<<"$bname2")"
                        read2="$( cut -d'_' -f 4 <<<"$bname2")"
                        if [ "$sample1" == "$sample2" ] && [ "$read1" != "$read2" ] \ 
                        && [ "$read1" == 1 ] ;
                        then
                        echo "$sample2" "$sample2" "$read1" "$read2"
                        input1=$INPATH1/$bname1.fq.gz
                        input2=$INPATH2/$bname2.fq.gz
                        output1=$OUTPATH/$bname1.paired.fq.gz
                        output2=$OUTPATH/$bname1.unpaired.fq.gz
                        output3=$OUTPATH/$bname2.paired.fq.gz
                        output4=$OUTPATH/$bname2.unpaired.fq.gz
                        echo "$input1" "$input2"
                        cd /home/usr/tools/Trimmomatic-0.39
                        java -jar trimmomatic-0.39.jar PE -phred33 \
                        "$input1" "$input2" "$output1" "$output2" "$output3" "$output4" \
                        ILLUMINACLIP:BGI_Adapters.fa:2:30:10 \
                        LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
                        fi
                done
        done
done

however, I keep getting error message like this

Exception in thread "main" java.io.FileNotFoundException: /dir/subdir/_L01_100_1.fq.gz (No such file or directory)
        at java.io.FileInputStream.open0(Native Method)
        at java.io.FileInputStream.open(FileInputStream.java:195)
        at java.io.FileInputStream.<init>(FileInputStream.java:138)
        at org.usadellab.trimmomatic.fastq.FastqParser.parse(FastqParser.java:135)
        at org.usadellab.trimmomatic.TrimmomaticPE.process(TrimmomaticPE.java:265)
        at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:555)
        at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
next-gen • 1.0k views
ADD COMMENT
0
Entering edit mode

Error is clear:

Exception in thread "main" java.io.FileNotFoundException: /dir/subdir/_L01_100_1.fq.gz (No such file or directory)

Looking at your script you do need to change the values for these two variables to match real folders you have on your server.

INPATH1=/home/groups/dir/subdir
OUTPATH=/home/groups/dir/subdir
ADD REPLY
0
Entering edit mode

Well, I especially changed the names of folders (data confidentiality and stuff like that :D )

ADD REPLY
0
Entering edit mode

I see. The first part suspiciously matched the error you showed so just wanted to be certain.

Next thing to check is your file names. Are they consistent? Do they end in .fq.gz and have _L01_100_1 in name? That naming is a bit odd since files generally should have _L001_001.fq.gz in their names. Put a number of echo commands in your script and see what is produced at each step to debug the issue.

ADD REPLY
0
Entering edit mode

these are BGI RNA Seq files, for example, V40002080_L01_109_1.fq.gz or V40002080_L01_109_2.fq.gz

ADD REPLY
0
Entering edit mode

So some combination of dir/subdir and _L01_100_1.fq.gz is not being found, because the sample name is not getting reconstituted properly.

ADD REPLY
0
Entering edit mode

Would it not be better to submit multiple jobs inside for loop instead of submitting a single job like this?

ADD REPLY
0
Entering edit mode

I was thinking about it, but how can i put two variables (input1 and 2) into one loop?

ADD REPLY
2
Entering edit mode
4.3 years ago
h.mon 35k

I suggest you use a combination of find --xargs or find | parallel instead of a nested for loop - see a simple example at A: Linux for loop for 2 inputs and 4 outputs for Trimmomatic fastq quality trimming . There are plenty of other posts with bash for loops as well, be sure to give them a read:

bash loop for alignment RNA-seq data

Trimmomatic job script to run on multiple pair end read file

And many others.

In addition, the most inner for loop ( for file2 in $dir/*.fq.gz; do) is totally unnecessary and only add complexity to the code: if the file names follow a consistent pattern, R2 file names can be obtained from R1 file names.

ADD COMMENT

Login before adding your answer.

Traffic: 1324 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6