Note that the GATK programming framework (engine, infrastructure and utility tools) remains fully open source under the MIT license. Developers are free to write their own tools on top of the GATK and distribute them without any restrictions. Only a subset of analysis tools are actually covered by the license restrictions.
To clarify or reword, the most interesting bits of GATK are absolutely not open source, which by most accepted definitions requires free access for all uses (see http://www.fsf.org/about/what-is-free-software or http://opensource.org/faq for more). When you have to brag in your post "That’s right, free as in beer", you're accepting that you are not actually releasing open source software that would be free as in speech.
While the "framework" may be open source, the variant calling routines and other methods that are of the greatest interest to genetics practitioners are still encumbered by the restrictive licensing agreements.
That's correct, the key analysis tools are released under a proprietary license, as acknowledged in the announcement and ensuing discussion thread in the forum. My comment on the framework was intended for software developers, in keeping with the focus of the OP, rather than for genetic practitioners, who typically have different concerns. What constitutes the "most interesting bits of GATK" depends largely on whether you are a developer or a user.
That license is way too long for me to understand, but I have to say it does not look like an open source license. It looks like some sort of mixture where some things are under MIT, some are under Broad and some parts are fully closed.
From past history of other software packages I know that even trivially simple and clear licenses have endless corner cases that make the legality of various applications extremely murky.
The GATK license is indeed not an open source license; however it only covers a subset of the full package (ie not the framework). We are aware of the potential for confusion -- nobody likes reading long licenses full of legalese! So we are considering offering a separate package of the framework source, starting with release 2.4 (when the updated license comes into effect). Developers would be able to download this package with the clear understanding that everything in it is fully open under the MIT license -- no more murkiness there at least.
FYI, we have just released v2.4 and as suggested earlier, we have created a separate github repo with the framework source, to make things entirely non-murky for developers.
GATK is a pretty big toolset, so it's hard to replace all of it in one fell swoop. But as far as variant-calling goes, you can use BBMap's "callvariants.sh" to get better results (from my tests) in a tiny fraction of the time. The BBMap package is fully open-source.
It's also easy to use - "callvariants.sh in=mapped.sam out=vars.vcf ref=ref.fa ploidy=2". I wrote it partly because alternatives were just too slow to seriously consider as part of a high-performance pipeline, and partly because they yielded incorrect results. I needed a fast program that gave correct variant calls and there just weren't any, so I wrote one myself.
To clarify or reword, the most interesting bits of GATK are absolutely not open source, which by most accepted definitions requires free access for all uses (see http://www.fsf.org/about/what-is-free-software or http://opensource.org/faq for more). When you have to brag in your post "That’s right, free as in beer", you're accepting that you are not actually releasing open source software that would be free as in speech.
While the "framework" may be open source, the variant calling routines and other methods that are of the greatest interest to genetics practitioners are still encumbered by the restrictive licensing agreements.
That's correct, the key analysis tools are released under a proprietary license, as acknowledged in the announcement and ensuing discussion thread in the forum. My comment on the framework was intended for software developers, in keeping with the focus of the OP, rather than for genetic practitioners, who typically have different concerns. What constitutes the "most interesting bits of GATK" depends largely on whether you are a developer or a user.
That license is way too long for me to understand, but I have to say it does not look like an open source license. It looks like some sort of mixture where some things are under MIT, some are under Broad and some parts are fully closed.
From past history of other software packages I know that even trivially simple and clear licenses have endless corner cases that make the legality of various applications extremely murky.
The GATK license is indeed not an open source license; however it only covers a subset of the full package (ie not the framework). We are aware of the potential for confusion -- nobody likes reading long licenses full of legalese! So we are considering offering a separate package of the framework source, starting with release 2.4 (when the updated license comes into effect). Developers would be able to download this package with the clear understanding that everything in it is fully open under the MIT license -- no more murkiness there at least.
Thanks following up with details and clarifications. Much appreciated!
Glad to help.
FYI, we have just released v2.4 and as suggested earlier, we have created a separate github repo with the framework source, to make things entirely non-murky for developers.
https://github.com/broadgsa/gatk