Using trimmomatic on multiple paired-end read files
2
0
Entering edit mode
7.9 years ago

I need help to write a for loop to run Trimmomatic tool for quality trimming of paired end fastq files. I need to write a for loop so that I can run an executable for all multiple files.

Input PE files looks like - C1_R1.fastq

C1_R2.fastq

C2_R1.fastq

C2_R2.fastq

C3_R1.fastq

C3_R2.fastq

T1_R1.fastq

T1_R2.fastq

T2_R1.fastq

T2_R2.fastq

T3_R1.fastq

T3_R2.fastq

To run trimmomatic for the paired reads corresponding to C1_R1.fastq and C1_R2.fastq, the following command works:

java -jar ~/Trimmomatic-0.36/trimmomatic-0.36.jar PE -phred33 C1_R1.fastq C1_R2.fastq C1_R1_paired.fastq C1_R1_unpaired.fastq C1_R2_paired.fastq C1_R2_unpaired.fastq LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35

I have tried using this thread, but I am unable to understand.

Any help please! Thanks!

ngs Assembly genome next-gen sequencing • 14k views
ADD COMMENT
1
Entering edit mode

Hi bioinformaticssrm2011,

The second command I posted on seqanswers (with an echo in front) should show you the command how it would be executed. That way you can figure out what is wrong.

Shouldn't you use this command in paired end mode?

java -jar <path to trimmomatic.jar> PE [-threads <threads] [-phred33 | -phred64] [-trimlog <logFile>] >] [-basein inputBase> | <input 1> <input 2>] [-baseout <outputBase> | <unpaired output 1> <paired output 2> <unpaired output 2> <step 1> ...
ADD REPLY
0
Entering edit mode

I got an error, if you see that thread

ADD REPLY
0
Entering edit mode

I've seen that error. Now read what I wrote about using the echo statement and figuring out what's wrong.

ADD REPLY
0
Entering edit mode

About echo, I am new to this, so unable to understand it. Sorry

ADD REPLY
0
Entering edit mode

If you don't know the echo command you need to start with following a command line tutorial, that will make everything less painful.

ADD REPLY
0
Entering edit mode

this is my python script

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import subprocess

with open('/data2/masw_data/Nanda/5A_and_5D/bbmap_config.txt', 'r') as f:
    for line in f:
        line = line.strip().split()
        ref, in1, in2, outm = line
        proc = subprocess.Popen(['/data1/masw/bbmap/bbmap.sh', ref, in1, in2, outm, 'threads=20', 'minid=0.70', 'nodisk'], shell=False)
        proc.wait()
f.writelines()
ADD REPLY
0
Entering edit mode

shengweima : Did you post in the wrong thread or were you presenting an example?

ADD REPLY
2
Entering edit mode
5.7 years ago

Automation of trimmomatic


What is Trimmomatic?

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application.

For details, read the manual here


✂️ About the script

The script can be used for running trimmomatic automatically for N no.of samples regardless of the file extension. Read below to understand the functionalities, usage and reasons for failure.

The script is accessible here

🚩 Why should one use this?

  • This script directly works with compressed/uncompressed fastq files exploiting the functionalities of trimmomatic.
  • Obviously, it will save a lot of time as trimming is one of the routine tasks and it ought to be automated.
  • It is robust and intuitive and errors while execution (if any) are self-explanatory as explained below.

🔧 Using the script:

Consider you have a directory with a mixture of fastq files with different file extensions

    |-- all_R1.fq
    |-- all_R1.fq.gz
    |-- all_R2.fq
    |-- all_R2.fq.gz
    |-- demo_R1.fq
    |-- demo_R2.fq
    |-- make_2_R1.fastq
    |-- make_2_R1.fastq.gz
    |-- make_2_R2.fastq
    |-- make_2_R2.fastq.gz
  • Example (1): Running script only for .fq.gz files:

    $ sh auto_trimmomatic.sh *.fq.gz

  • Example (2): Running script only for .fq files:

    $ sh auto_trimmomatic.sh *.fq

  • Example (3): Running script only for .fastq files:

    $ sh auto_trimmomatic.sh *.fastq

  • Example (4): Running script only for .fastq.gz files:

    $ sh auto_trimmomatic.sh *.fastq.gz

  • Example (5): Running script only for all files in the directory:

    $ sh auto_trimmomatic.sh *

ℹ️ Invoking help

$ sh auto_trimmomatic.sh --help OR

$ sh auto_trimmomatic.sh -h

⚠️ Error handling

  • No parameters passed to script
[mypc]$ sh auto_trimmomatic.sh 
Error: No parameter(s) provided
Usage: sh auto_trim [*.extension]
       extension: <fq> or <fastq> or <fq.gz> or <fastq.gz> or <*>
       example: sh auto_trim.sh *.fq.gz or sh auto_trim.sh *
 
Help:  sh auto_trimmomatic.sh -h or --help
  • No files in the directory with user provided extension
[mypc]$ sh auto_trimmomatic.sh *.fq

FileNotFoundError: No such file with extension *.fq found!
Supported extensions are: <.fq> or <.fastq> or <.fq.gz> or <.fastq.gz>
  • Fastq file names do not have R1 - R2 naming conventions. Say if you have files like these - demo_1.fq, demo_2.fq, the script will fail:
[mypc]$ sh auto_trimmomatic.sh *.fq

Filename Error: Paired end file names should contain _R1 _R2
Example: test_R1.fq.gz, test_R2.fq.gz
ℹ️ Rename the fastq files as demo_R1.fq, demo_R2.fq. This checkpoint was essential to maintain integrity of the script.

ADD COMMENT
1
Entering edit mode
7.9 years ago
agata88 ▴ 870

You need to specify the directories to your input/output files :

"inputdirectory" directory "processed" directory (as output) "log" directory for log information "program" directory with Trimmomatic jar file

#! python
import os
import sys
import subprocess 

for fileR1 in os.listdir(inputdirectory): 
    dividing = fileR1.split(".")
    if ("R1" in fileR1) :
        fileR2 = fileR1.replace('R1', 'R2')
        if os.path.isfile(inputdirectory + fileR2) :
            dividing1 = fileR2.split(".")
            log1 = dividing[0]
            output1 = dividing[0]
            output2 = dividing1[0]
            subprocess.call("java -jar " + program + "trimmomatic-0.35.jar PE -threads 12 -phred33 " + inputdirectory + fileR1 + " " + inputdirectory + fileR2 + " " + processed + output1 +"_trimmed.fastq.gz " +  "output_forward_unpaired.fq.gz "  + processed + output2 + "_trimmed.fastq.gz " + "output_reverse_unpaired.fq.gz " + " LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35" +" >" + log + log1 + "_trimmomatic.txt" + " 2>&1", shell=True)

Hope it helps :) Best,

Agata

ADD COMMENT
1
Entering edit mode

hi, When I was trying to use above script, I got an error.

    #! python
    import os
    import sys
    import subprocess 
    inputdirectory = "/home/shashank/Desktop/patel_chest_NSC/inputdirectory"
   processed = "/home/shashank/Desktop/patel_chest_NSC/processed"
   log="/home/shashank/Desktop/patel_chest_NSC/log"
   for fileR1 in os.listdir(inputdirectory): 
        dividing = fileR1.split(".")
        if ("R1" in fileR1) :
            fileR2 = fileR1.replace('R1', 'R2')
            if os.path.isfile(inputdirectory + plikR2) :
                dividing1 = fileR2.split(".")
                log1 = dividing[0]
                output1 = dividing[0]
                output2 = dividing1[0]
                subprocess.call("java -jar " + program + "~/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 12 -phred33 " + inputdirectory + fileR1 + " " + inputdirectory + fileR2 + " " + processed + output1 +"_trimmed.fastq.gz " +  "output_forward_unpaired.fq.gz "  + processed + output2 + "_trimmed.fastq.gz " + "output_reverse_unpaired.fq.gz " + " LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:35" +" >" + log + log1 + "_trimmomatic.txt" + " 2>&1", shell=True)

Error is-

   Traceback (most recent call last):
  File "python.py", line 10, in <module>
    if os.path.isfile(inputdirectory + plikR2) :
NameError: name 'plikR2' is not defined
ADD REPLY
0
Entering edit mode

Please replace 'plikR2' to 'fileR2'.

ADD REPLY
0
Entering edit mode

Hi, thanks for your help. But I got an error

File "python.py", line 8 if ("R1" in fileR1): ^ IndentationError: unindent does not match any outer indentation level

than I checked the indent and tried to solve it, and later I got another error-

Traceback (most recent call last): File "python.py", line 6, in <module> for fileR1 in os.listdir(input): TypeError: coercing to Unicode: need string or buffer, builtin_function_or_method found

ADD REPLY
1
Entering edit mode

input shouldn't be used as a variable name in python3. Make it inputfile or something like that.

ADD REPLY
0
Entering edit mode

Thats helpful.

but another error pop up-

Traceback (most recent call last): File "python.py", line 6, in <module> for fileR1 in os.listdir(inputfile): NameError: name 'inputfile' is not defined

ADD REPLY
0
Entering edit mode

That's because you didn't define the variable.

ADD REPLY
0
Entering edit mode

Yes, that is correct it shouldn’t be 'input' , I forgot about that when I was changing names... sorry ... going to change that right now.

ADD REPLY
0
Entering edit mode

Changed, now it should be ok, also the indentation was repaired.

ADD REPLY
0
Entering edit mode

Hi, Thanks for you python script. I am trying to run this script but I have it not running or giving me any output. and also, I have no error come at any step. when I run it the result is nothing. Can you help me, please! Thanks

ADD REPLY
0
Entering edit mode

Hi, check the indentations. maybe during copy/paste something was shifted, best, Agata

ADD REPLY

Login before adding your answer.

Traffic: 3983 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6