Tool:Shell script to automate trimmomatic for multiple samples
1
4
Entering edit mode
6.0 years ago

Couple of months back, I had developed this shell script to automate trimmomatic for multiple paired end fastq files. I could see multiple posts on biostars and I believe this is a routine work at many institutions.

Availability:

Hosted on GitHub

Feedback is highly appreciated

  • Reach me on GitHub

    or

  • Post comments here on Biostars


Automation of trimmomatic


What is Trimmomatic?

Trimmomatic is a fast, multithreaded command line tool that can be used to trim and crop Illumina (FASTQ) data as well as to remove adapters. These adapters can pose a real problem depending on the library preparation and downstream application.

For details, read the manual here


✂️ About the script

The script can be used for running trimmomatic automatically for N no.of samples regardless of the file extension. Read below to understand the functionalities, usage and reasons for failure.

The script is accessible here

🚩 Why should one use this?

  • This script directly works with compressed/uncompressed fastq files exploiting the functionalities of trimmomatic.
  • Obviously, it will save a lot of time as trimming is one of the routine tasks and it ought to be automated.
  • It is robust and intuitive and errors while execution (if any) are self-explanatory as explained below.

🔧 Using the script:

Consider you have a directory with a mixture of fastq files with different file extensions

    |-- all_R1.fq
    |-- all_R1.fq.gz
    |-- all_R2.fq
    |-- all_R2.fq.gz
    |-- demo_R1.fq
    |-- demo_R2.fq
    |-- make_2_R1.fastq
    |-- make_2_R1.fastq.gz
    |-- make_2_R2.fastq
    |-- make_2_R2.fastq.gz
  • Example (1): Running script only for .fq.gz files:

    $ sh auto_trimmomatic.sh *.fq.gz

  • Example (2): Running script only for .fq files:

    $ sh auto_trimmomatic.sh *.fq

  • Example (3): Running script only for .fastq files:

    $ sh auto_trimmomatic.sh *.fastq

  • Example (4): Running script only for .fastq.gz files:

    $ sh auto_trimmomatic.sh *.fastq.gz

  • Example (5): Running script only for all files in the directory:

    $ sh auto_trimmomatic.sh *

ℹ️ Invoking help

$ sh auto_trimmomatic.sh --help OR

$ sh auto_trimmomatic.sh -h

⚠️ Error handling

  • No parameters passed to script
[mypc]$ sh auto_trimmomatic.sh 
Error: No parameter(s) provided
Usage: sh auto_trim [*.extension]
       extension: <fq> or <fastq> or <fq.gz> or <fastq.gz> or <*>
       example: sh auto_trim.sh *.fq.gz or sh auto_trim.sh *
 
Help:  sh auto_trimmomatic.sh -h or --help
  • No files in the directory with user provided extension
[mypc]$ sh auto_trimmomatic.sh *.fq

FileNotFoundError: No such file with extension *.fq found!
Supported extensions are: <.fq> or <.fastq> or <.fq.gz> or <.fastq.gz>
  • Fastq file names do not have R1 - R2 naming conventions. Say if you have files like these - demo_1.fq, demo_2.fq, the script will fail:
[mypc]$ sh auto_trimmomatic.sh *.fq

Filename Error: Paired end file names should contain _R1 _R2
Example: test_R1.fq.gz, test_R2.fq.gz
ℹ️ Rename the fastq files as demo_R1.fq, demo_R2.fq. This checkpoint was essential to maintain integrity of the script.

bash shell trimmomatic • 6.5k views
ADD COMMENT
2
Entering edit mode

thanks for sharing. IMHO, you should have a look at solutions like nextflow or snakemake.

ADD REPLY
0
Entering edit mode

Hi Pierre Lindenbaum

I completely agree with you on that. I should start learning. Thanks for the suggestion.

ADD REPLY
1
Entering edit mode

Thanks for sharing the params.

ADD REPLY
0
Entering edit mode

Thanks cpad0112

ADD REPLY
3
Entering edit mode
6.0 years ago
ATpoint 87k

Suggestion for improvement: Check up front at the beginning of a script that all necessary tools are in PATH and/or defined in variables, e.g.:

## $TRIMMOMATIC could be a path like $HOME/software/trimmomatic.jar
TOOLS=(samtools bedtools bowtie2 $TRIMMOMATIC)

## A simple command that checks if tools can be found, if not names are written to a file <missing_tools.txt>
function PathCheck {

  if [[ $(command -v $1 | wc -l) == 0 ]]; then 
    echo ${1} >> missing_tools.txt
    fi

}; export -f PathCheck

## Check all tools:
for i in $(echo ${TOOLS[*]}); do
  PathCheck $i; done

## If any of the specified tools is missing, throw an error and exit:
if [[ -e missing_tools.txt ]] && [[ $(cat missing_tools.txt | wc -l | xargs) > 0 ]]; then
  echo '[ERROR] Missing tools -- see missing_tools.txt for details' && exit 1
  fi
ADD COMMENT
0
Entering edit mode

Thanks for the awesome suggestion ATpoint. That's worth implementing!

ADD REPLY

Login before adding your answer.

Traffic: 3230 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6