Trimmomatic in windows
3
0
Entering edit mode
5 months ago
Clíona • 0

I am trying to set up a .bat file to run Trimmomatic through multiple fastq.gz files - I am only interested in using ILLUMINACLIP. I keep getting an error with the adapter file path, see example below of a log file with the error message. Any ideas?

TrimmomaticPE: Started with arguments:
 -phred33 C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v1_S1_L001_R1_001.fastq.gz C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v1_S1_L001.fastq_R2_001.fastq.gz C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs\CSM020_v1_S1_L001.fastq_R1_paired.fastq.gz C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs\CSM020_v1_S1_L001.fastq_R1_unpaired.fastq.gz C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs\CSM020_v1_S1_L001.fastq_R2_paired.fastq.gz C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs\CSM020_v1_S1_L001.fastq_R2_unpaired.fastq.gz ILLUMINACLIP:C:\Users\Shared\CheeseStudy\Trimmomatic\Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PE.fa:2:30:10:5
Exception in thread "main" java.lang.NumberFormatException: For input string: "\Users\Shared\CheeseStudy\Trimmomatic\Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PE.fa"
    at java.lang.NumberFormatException.forInputString(Unknown Source)
    at java.lang.Integer.parseInt(Unknown Source)
    at java.lang.Integer.parseInt(Unknown Source)
    at org.usadellab.trimmomatic.trim.IlluminaClippingTrimmer.makeIlluminaClippingTrimmer(IlluminaClippingTrimmer.java:54)
    at org.usadellab.trimmomatic.trim.TrimmerFactory.makeTrimmer(TrimmerFactory.java:32)
    at org.usadellab.trimmomatic.Trimmomatic.createTrimmers(Trimmomatic.java:59)
    at org.usadellab.trimmomatic.TrimmomaticPE.run(TrimmomaticPE.java:552)
    at org.usadellab.trimmomatic.Trimmomatic.main(Trimmomatic.java:80)
windows 16S trimmomatic • 531 views
ADD COMMENT
1
Entering edit mode
5 months ago
GenoMax 148k

Because you are using windows you may need to enclose the adapter file path in quotes (try single and double quotes).

ILLUMINACLIP:"C:\Users\Shared\CheeseStudy\Trimmomatic\Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PE.fa":2:30:10:5
ADD COMMENT
0
Entering edit mode

Thank you.

I am able to run trimmomatic for one file using the following in the windows cd (tasking one sample CSM020 as an example):

java -jar C:\Users\Shared\CheeseStudy\Trimmomatic\Trimmomatic-0.39\Trimmomatic-0.39\trimmomatic-0.39.jar PE -phred33 -trimlog C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoLog\CSM020_v1Log -basein C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v1_S1_L001_R1_001.fastq.gz -baseout C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs\CSM020_v1.fq.gz ILLUMINACLIP:Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PEUpdated.fa:2:30:10:5

I have a number of fastq files - I have 69 participant IDs (e.g., CSM020) and two timepoints for each participant (v1 or v2), and then R1 nd R2 files - so each participant ID has 4 sets of fastqs. Here is an example of what the fastqs look like for one participant:

"C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v1_S1_L001_R1_001.fastq.gz"
"C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v1_S1_L001_R2_001.fastq.gz"
"C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v2_S72_L001_R1_001.fastq.gz"
"C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq\CSM020_v2_S72_L001_R2_001.fastq.gz"

I am trying to set up a .bat file to run the above trimmomatic settings through all of the fastq files I have in the directory "C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq" which contains all the files. I am struggling to get it set up correctly in terms of identifying the R1 and R2 file pairs correctly, while also considering the v1 and v2 parameters. I have been trying with code like this (writing .bat file in notepad and running it in cd) but it's constructing the R2 file incorrectly and of course failing . Any ideas?

@echo off
setlocal EnableDelayedExpansion

rem Define base path
set "BASE_PATH=C:\Users\Shared\CheeseStudy\Trimmomatic"

rem Define paths relative to the base path
set "TRIMMOMATIC_PATH=%BASE_PATH%\Trimmomatic-0.39\Trimmomatic-0.39\trimmomatic-0.39.jar"
set "ADAPTERS_PATH=Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PEUpdated.fa"
set "INPUT_DIR=%BASE_PATH%\RawCheesefastq"
set "OUTPUT_DIR=%BASE_PATH%\TrimmoOutputs"
set "LOG_DIR=%BASE_PATH%\TrimmoLog"

rem Create output and log directories if they don't exist
if not exist "%OUTPUT_DIR%" mkdir "%OUTPUT_DIR%"
if not exist "%LOG_DIR%" mkdir "%LOG_DIR%"

rem Change to the base directory to use relative paths
pushd "%BASE_PATH%"

rem Loop through R1 files in the input directory
for %%f in ("%INPUT_DIR%\*R1_001.fastq.gz") do (
    rem Extract base name by removing _R1_001.fastq.gz
    set "FILENAME=%%~nf"
    set "BASE=!FILENAME:_R1_001=!"

    rem Construct the corresponding R2 file path
    set "FILE_R2=%INPUT_DIR%\!BASE!_R2_001.fastq.gz"

    rem Define output file names
    set "OUTPUT_R1_PAIRED=%OUTPUT_DIR%\!BASE!_R1_paired.fastq.gz"
    set "OUTPUT_R1_UNPAIRED=%OUTPUT_DIR%\!BASE!_R1_unpaired.fastq.gz"
    set "OUTPUT_R2_PAIRED=%OUTPUT_DIR%\!BASE!_R2_paired.fastq.gz"
    set "OUTPUT_R2_UNPAIRED=%OUTPUT_DIR%\!BASE!_R2_unpaired.fastq.gz"
    set "LOG_FILE=%LOG_DIR%\!BASE!.log"

    rem Debug: Echo the file paths
    echo Processing R1 file: %%f
    echo Looking for R2 file: !FILE_R2!

    rem Check if the R2 file exists
    if exist "!FILE_R2!" (
        rem Run Trimmomatic
        java -jar "%TRIMMOMATIC_PATH%" PE -phred33 "%%f" "!FILE_R2!" "!OUTPUT_R1_PAIRED!" "!OUTPUT_R1_UNPAIRED!" "!OUTPUT_R2_PAIRED!" "!OUTPUT_R2_UNPAIRED!" ILLUMINACLIP:"%ADAPTERS_PATH%":2:30:10:5 > "!LOG_FILE!"

        rem Log completion
        echo Done with %%f
    ) else (
        echo File not found: !FILE_R2!
        echo Skipping this pair.
        echo Skipping this pair. >> "%LOG_DIR%\missing_files.log"
    )
)

rem Return to the original directory
popd

endlocal
ADD REPLY
0
Entering edit mode

You are brave. Writing .bat files for processing of NGS data on windows. :-)

I suggest using ChatGPT to see if it can catch the problem you are having with recreating the R2 file name.

ADD REPLY
1
Entering edit mode
5 months ago
ATpoint 86k

Bioinformatics, and especially the preprocessing of NGS data are done in Unix environments, not Windows. There is no point debugging Windows errors, as tool developers never had Windows in mind. It's wasted energy. Use a Linux machine or install WSL2 for Windows and your problems are gone.

ADD COMMENT
0
Entering edit mode
5 months ago
Clíona • 0

Thanks, I asked the robot and it seems to be working through the files and the outputs look correct so far - I used the code below in case anyone else is suffering with this in the future:

@echo off
setlocal enabledelayedexpansion

set TRIMMOMATIC_JAR="C:\Users\Shared\CheeseStudy\Trimmomatic\Trimmomatic-0.39\Trimmomatic-0.39\trimmomatic-0.39.jar"
set ADAPTERS="Trimmomatic-0.39\Trimmomatic-0.39\adapters\NexteraPE-PEUpdated.fa"
set LOG_DIR="C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoLog"
set INPUT_DIR="C:\Users\Shared\CheeseStudy\Trimmomatic\RawCheesefastq"
set OUTPUT_DIR="C:\Users\Shared\CheeseStudy\Trimmomatic\TrimmoOutputs"

:: Initialize an empty list to store participant IDs
set PARTICIPANTS=

:: Loop through files to extract unique participant IDs
for %%F in (%INPUT_DIR%\*_v*_S*_L001_R1_001.fastq.gz) do (
    set "filename=%%~nF"
    for /f "tokens=1 delims=_" %%A in ("!filename!") do (
        if "!PARTICIPANTS!" == "" (
            set "PARTICIPANTS=%%A"
        ) else (
            set "found=0"
            for %%B in (!PARTICIPANTS!) do (
                if "%%A" == "%%B" set "found=1"
            )
            if !found! == 0 (
                set "PARTICIPANTS=!PARTICIPANTS! %%A"
            )
        )
    )
)

:: Debugging output to check participant IDs
echo Participants: %PARTICIPANTS%

:: Loop through each participant
for %%P in (%PARTICIPANTS%) do (
    :: Loop through each timepoint
    for %%T in (v1 v2) do (
        for %%F in (%INPUT_DIR%\%%P_%%T_S*_L001_R1_001.fastq.gz) do (
            set R1_FILE=%%F
        )
        for %%F in (%INPUT_DIR%\%%P_%%T_S*_L001_R2_001.fastq.gz) do (
            set R2_FILE=%%F
        )
        set BASEOUT="%OUTPUT_DIR%\%%P_%%T"
        set LOGFILE="%LOG_DIR%\%%P_%%T_Log"

        :: Debugging output to check file paths
        echo R1_FILE: !R1_FILE!
        echo R2_FILE: !R2_FILE!
        echo BASEOUT: !BASEOUT!
        echo LOGFILE: !LOGFILE!

        :: Execute Trimmomatic command
        java -jar %TRIMMOMATIC_JAR% PE -phred33 -trimlog !LOGFILE! !R1_FILE! !R2_FILE! !BASEOUT!_1P.fq.gz !BASEOUT!_1U.fq.gz !BASEOUT!_2P.fq.gz !BASEOUT!_2U.fq.gz ILLUMINACLIP:%ADAPTERS%:2:30:10:5
    )
)
pause
ADD COMMENT

Login before adding your answer.

Traffic: 1894 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6