Bash vs WDL for running GATK
2
0
Entering edit mode
6.0 years ago

I am currently running a brief variant calling project. I was running GATK using standard tool commands in a Linux shell when I came across a blog on Workflow Description Language (WDL) scripting for GATK which runs on cromwell.

What would be the primary differences in running GATK (4.0.11) using either one of them ? I read that WDL is an "analysis pipieline" but haven't got much of an idea as to what that means in terms of bioinformatics.

GATK Linux Pipeline WDL Cromwell • 3.8k views
ADD COMMENT
3
Entering edit mode
6.0 years ago
vdauwera ★ 1.2k

WDL is intended to help you automate the work by chaining together commands, so you don't have to do it manually. The result is called a workflow or pipeline. There are other languages and systems besides WDL that allow you to do this. The main advantage of WDL is that it is what the GATK team uses so you can find pre-written workflows on Github for all the major use cases supported by GATK (see https://github.com/gatk-workflows/ ). You can use the GATK WDL scripts right out of the box or you can modify them to suit your own project, and so you don't have to do as much work. Also, WDL is quite user-friendly so it is suitable for someone who has not written workflows before.

ADD COMMENT
0
Entering edit mode

Thank you ! Pardon me if the question sounds dumb but what would be the difference vis a vis just piping commands together in a bash shell using the standard tools-specific commands ? What would be the primary advantage of a WDL script / Cromwell engine in that aspect ? Does it streamline my processes more efficiently and work faster ?

I'm essentially looking for a way to streamline and combine as many steps as possible (since I have 100+ paired samples to align and call) in a fast manner.

ADD REPLY
1
Entering edit mode

What would be the primary advantage of a WDL script / Cromwell engine in that aspect ?

parallelization, don't re-make existing files on failure, etc...

ADD REPLY
2
Entering edit mode
6.0 years ago

https://academic.oup.com/bib/article/18/3/530/2562749

Bioinformatic analyses invariably involve shepherding files through a series of transformations, called a pipeline or a workflow. Typically, these transformations are done by third-party executable command line software written for Unix-compatible operating systems. The advent of next-generation sequencing (NGS), in which millions of short DNA sequences are used as the source input for interpreting a range of biological phenomena, has intensified the need for robust pipelines. NGS analyses tend to involve steps such as sequence alignment and genomic annotation that are both time-intensive and parameter-heavy.

ADD COMMENT

Login before adding your answer.

Traffic: 1672 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6