Difference Between "Pipeline" And "Workflow" ?
4
9
Entering edit mode
12.8 years ago
Pascal ★ 1.5k

A quick and basic question today. I often see in literature (in particular in the context of NGS) the words "pipeline" and "workflow" used alternatively. Is there a real difference between those?

next-gen sequencing • 51k views
ADD COMMENT
3
Entering edit mode

1st world problems :D

ADD REPLY
1
Entering edit mode

Looks like the consensus will be: no consensus

ADD REPLY
8
Entering edit mode
8.3 years ago
stevegt ▴ 80

From IT and C/S usage:

Pipeline

A pipeline is a series of processes, usually linear, which filter or transform data. The processes are generally assumed to be running concurrently. The data flow diagram of a pipeline does not normally branch or loop. The first process takes raw data as input, does something to it, then sends its results to the second process, and so on, eventually ending with the final result being produced by the last process in the pipeline. Pipelines are normally quick, with a flow taking seconds to hours for end-to-end processing of a single set of data.

Examples of pipelines in the real world include chaining two or more processes together on the command line using the '|' (pipe) symbol, with results in stdout or redirected to a file, or a simple software build process driven by 'make'.

Workflow

A workflow is a set of processes, usually non-linear, often human rather than machine, which filter or transform data, often triggering external events. The processes are not assumed to be running concurrently. The data flow diagram of a pipeline can branch or loop. There may be no clearly defined "first" process -- data may enter the workflow from multiple sources. Any process may take raw data as input, do something to it, then send its results to another process. There may be no single "final result" from a single process; rather, multiple processes might deliver results to multiple recipients. Workflows can be complex and long-lived; a single flow may take days, months, or even years to execute.

Examples of workflows in the real world include document, bug, or order processing, or iterative processing of very large data sets, particularly if humans are in the loop.

Mixing of terms

These terms have become mixed in recent years, in part because pipelines can be implemented as a very simple subset of workflows. In previous decades, workflow software was large, complex, commercial, and involved high licensing fees, while pipelines were a thing you did on the fly or in a shell script. The terminology has become more blurred as simpler "workflow" software packages have emerged; some of these are really just complicated versions of distributed 'make', and don't support humans in the loop. They really should have been called "data flow" rather than workflow packages. Likewise, there have been more efforts to support branching, looping, and suspended flows in "pipeline" libraries for various languages, and we've seen more pipelines spread over multiple machines, with data transport via HTTP, other TCP protocols, or shared networked filesystems.

ADD COMMENT
0
Entering edit mode

I believe there is a small typo in the second line of the workflow paragraph. It should state: The data flow diagram of a workflow can branch or loop. Thank you

ADD REPLY
6
Entering edit mode
12.8 years ago

A pipeline could just be a bunch of commands embedded in a build script.

When I hear workflow I think exclusively of a heavyweight platform like Taverna that is designed to make it easy for end users to use modular units to construct analyses. Of course, Pipeline Pilot also falls into this category, so it appears I might be the only one who makes this assumption.

http://en.wikipedia.org/wiki/Bioinformatics_workflow_management_systems

ADD COMMENT
0
Entering edit mode

One could also say that a workflow is a high-level concept that can include manual or even wet lab operations

ADD REPLY
5
Entering edit mode
12.8 years ago

I would tend to think there is little difference but I do use these terms in slightly different ways.

I use 'pipeline' to refer to an established (often large) workflow (e.g. the Ensembl pipeline) that may have flow control built-in.

I use the term 'workflow' as a series of computational steps, usually programmed to run at once but sometimes just their conception notion is enough to refer to it as such.

ADD COMMENT
1
Entering edit mode
12.8 years ago
User 1686 ▴ 10

I suspect that in practice there's not a lot to it, and the difference in usage maybe to do with the background of the speaker. For example, in my usage a workflow is a more formal, strict and computational term than pipeline. If I had to justify that, certain (non-bioinformatic) software systems have workflows meaning that documents and data move automatically from stage to stage, which is not far from Galaxy's series of analysis steps. But they're foggy terms.

ADD COMMENT

Login before adding your answer.

Traffic: 2659 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6