Question

When exit code is not enough

2

Entering edit mode

10.0 years ago

Rad ▴ 810

There is a lot of tools out there, very useful for command line usage, and very widely used in Bioinformatics, which rapidly turns out to be annoying (may be, sometimes) if we are writing a pipeline that cares about I/O connection and each tasks' exit status.

I am writing a pipeline using samtools, and samtools turns out to be a little bit annoying in the I/O management, because sometimes it generates an output file, but you don't really explicitly name that file. Sometimes, other tools don't even prompt for output files, or some other tools ask users to provide paths literally which adds up more turnarounds that need to be introduced and this can be a bit frustrating. Here is an example using samtools. I am wrapping a call to samtools on a file that does not exist, the command is failing but the exit code is still zero which is a bit misleading if we care about reporting the status of the entire pipeline, which means here it means that the sort went Ok and this will trigger other tasks, which is wrong

I am cross posting to CodersCrowd as well with a code you can run on the browser: http://coderscrowd.com/app/codes/view/288. You can see that either the status coming back from the docker image and the one coming back from the python interpreter itself (which is basically samtools exit status) is being zero, and it shouldn't be.

< image not found >

samtools exit-code python • 4.1k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Rad ▴ 810

1

Entering edit mode

In the particular case of samtools, I would think that if you're using python anyway then it might be more convenient to just use pysam. Then the errors can be more easily caught in the base python script (at least in theory).

ADD REPLY • link 10.0 years ago by Devon Ryan 104k

0

Entering edit mode

I agree that predictable exit codes are convenient, but there are so many other ways to determine whether or not it's safe to proceed in your pipeline for a given step. In this case, since you're using samtools sort, why not just check the size of the expected output file? You can set the name of the sorted bam to be whatever you want (samtools automatically appends .bam to whatever name you choose).

import os

filesize = os.path.getsize(myfile)

ADD REPLY • link 10.0 years ago by Dan D 7.4k

0

Entering edit mode

That's possible in case we do know the expected output file, sometimes it is not possible to predict, I mean if the branch of that pipeline is itself subject to a condition

ADD REPLY • link 10.0 years ago by Rad ▴ 810

0

Entering edit mode

Not being able to predict a filename is a problem with the coding/construction of the pipeline, not something truly intractable.

ADD REPLY • link 10.0 years ago by Dan D 7.4k

0

Entering edit mode

this is a very common situation, one that does not really have a solution other than working around the problems in various inconvenient ways - as for the causes: there is very little incentive and few rewards for writing code that behaves as it should

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.0 years ago by Istvan Albert 102k

0

Entering edit mode

I agree with you Istvan, this is basically what we have to do on a daily basis (working around the problems) but these are basics of soft eng (being able to distinguish stderr / stdout / managing exit codes etc ..), and for reproducibility sake one should be able to just look at the log to get sense of what's going on in the pipeline. Incentives of getting the right exit code from a program is just doing this the right way or I should say as much right as possible :)

ADD REPLY • link 10.0 years ago by Rad ▴ 810

5

Entering edit mode

10.0 years ago

Alex Reynolds 36k

I am wrapping a call to samtools on a file that does not exist

Though I agree that a binary should be written to test conditions and throw appropriate exit codes as much as possible, you're asking samtools to do something here that it is generally not asked to do, i.e. check if a file exists.

Perhaps adopt a defensive coding position and assume that it only works if all the inputs are known to be valid, by instead using the Python os library to first check if the input file exists, and throw an exception, if not.

If you write your program to instead read data via standard input and write data to standard output — which is almost always a better design approach for bioinformatics tools and pipelines — then the Python interpreter can test if standard input is available to consume and throw an exception or do something else, if not present.

In other words, in the case of:

$ foo.py < some_data_stream

You can do something like the following to test if data are available:

import os, sys, stat
mode = os.fstat(0).st_mode
inputIsNotAvailable = True
if stat.S_ISFIFO(mode) or stat.S_ISREG(mode):
    inputIsNotAvailable = False
if inputIsNotAvailable:
    sys.stderr.write( "Error: Please redirect or pipe in data\n" )
    return os.EX_NOINPUT

By feeding in standard input, you can directly test if any data is coming in, before passing that data along to a downstream process. Further, in this example, the interpreter exits the function with a POSIX-compliant exit code that can evaluated and used to exit the script or do other behavior.

In any case, I would strongly advise against using Python for I/O-heavy tasks. It is very, very slow at this job, but if you must use Python, then use as much of it as possible to validate inputs, before firing up an external process.

ADD COMMENT • link 10.0 years ago by Alex Reynolds 36k

0

Entering edit mode

10.0 years ago

Rad ▴ 810

Strangely samtools view returns the right status in case of a problem, an example is posted @ CodersCrowd

ADD COMMENT • link 10.0 years ago by Rad ▴ 810

Ram · Accepted Answer · 2014-12-03

Thank you all for your contributions.

To all those who will be interested in this kind of problems, the overall message that should be retained from this discussion is : when there is a problem in the source, it should be treated in the source, whatever wrapper, codes, tricks that you may end up doing, will most likely be not reproducible at all.

I end up cross posting to samtools mailing list as well and here is the final thought :

You are testing an old version of samtools.  In the particular case of samtools sort's exit status
$ samtools-1.0 sort S222.bam S22.sorted; echo "exit code $?"
[E::hts_open] fail to open file 'S222.bam'
[bam_sort_core] fail to open file S222.bam
exit code 1

So as you can see, it was a minor bug, and that was fixed in the last version of samtools

Thank you all for contributing to this discussion