Hi,
I found this article, I think It is a nice one for developing a command line tools.
Source Minimum standards for bioinformatics command line tools
- Print something if no parameters are supplied
Unless your tool is a filter which works by manipulating stdin to stdout, you should always print out something (some help text, ideally) if the user runs your tool without all the required parameters. Just exiting quietly isn't helping anyone.
% biotool
Please use the --help option to get usage information.
- Always have a "-h" or "--help" switch
The Unix tradition is for all commands to have a "-h" or "--help" switch, which when invoked, prints usage information about the command. Most languages come with a getopt() type library, so there is no excuse for not supporting this.
% biotool -h
Usage: biotool [options] <file.fq>
Options:
--rc reverse complement
--trim nn trim <nn> bases from 3' end first
--mask remove vector sequence contaminant
- Have a "-v" or "--version" switch
Many bioinformatics tools today are used as part of larger pipelines, or put into the Galaxy toolshed. Because compatibility is dependent on the version of your tool being used, you should have a simple, machine-parseable way to identify what version of tool you have.
% biotool --version
biotool 1.3a
- Use stderr for messages and errors
If you need to print an error message, are just printing out progress or log information, try and use stderr rather than stdout. Try to reserve stdout for use as your output channel, so that it can be used in Unix pipes to avoid temporary files.
% biotool reads.fq | fq2fa > clean.fq
biotool: processing reads.fq
fq2fa: converted 423421 reads
- Validate your parameters
If you have command line options, do some validation or sanity checking on them before letting them through to your critical code. Many getopt() libraries support basic validation, but ultimately it is not that difficult to have a preamble with some "if not XXX { print ERROR ; exit }" clauses.
% biotool --trim -3 reads.fq
Error: --trim must be an integer > 0
- Don't hard-code any paths
Often the tool you write depends on some other files, such as config files or database/model files. The easiest, but wrong and annoying, thing to do is just put
% biotool --mask reads.fq
Error: can't load /home/steven/work/biotool/data/vector.seq
# ARRRGGGGHHH!
- Don't pollute the command-line name space
You've come up with a new tool called "BioTool". The command you want everyone to invoke is called "biotool", but it is just a master script which runs lots of other tools. Unfortunately you used lots of generic names like "fasta2fastq", "convert", "filter" .. and so on, and you've put them all in the same folder at the main "biotool" script. So when I install BioTool, my PATH gets filled with rubbish. Please don't do this.
% ls -1 /opt/BioTool/
biotool
convert # whoops, clashes with ImageMagick!
load-hash.py # hello Titus :-)
filter
diff # whoops, clashes with standard Unix tool!
test.sh # <face-palm>
The first solution is to prefix all your sub-tools and helper scripts with "biotool". The second solution, if they are scripts only, is to not make them executable (so they don't go in PATH) and invoke the via the interpreter (perl, python, ...) explicitly from biotool. The third solution is too put them all in a separate folder (eg. auxiliary/, scripts/ ...) and explicitly call them (but take note of #6 above).
- Don't distribute bare JAR files
If your tool is written in Java and is distributed as a JAR file, please write a simple shell wrapper script to make it simple to invoke. The three lines below are all you need (in the simple case) and you will make your users much happier.
#!/bin/bash
PREFIX=$(dirname $0)
java -Xmx500m -jar $PREFIX/BioTool.jar $*
- Check that your dependencies are installed
I've installed BioTool, and I start running it, and all looks good. Then 2 hours later it spits out an error like "error: can't run sff2CA". This could all be avoided if biotool checked all the external tools it needed before it commenced, and save your users associating your software with pain.
% biotool --stitch R1.fq R2.fq
This is biotool 1.3a
Loaded config
Checking for 'bwa': found /usr/bin/bwa
Checking for 'samtools': ERROR - could not find 'samtools'
Exiting.
- Be strict if you are still a Perl tragic like me
If you're old like me and Perl is still your native tongue, at least play it a little bit safer by starting all your scripts with the following lines:
#!/usr/bin/env perl
use strict;
use warnings;
use Fatal;
Update
Put in your consideration this when It comes to write documentation
Top considerations for creating bioinformatics software documentation
very nice article, thank you! Let's hope that many people will adopt these guidelines.
This is great, thanks!