Hello,
I would like to call germline variants with GATK4, i.e. map with bwa mem, mark duplicates ecetera ... I haven't done that in a while but back then it was like a few commands. So now that I'm checking the pipeline commands, what I find is this: https://github.com/gatk-workflows/broad-prod-wgs-germline-snps-indels/blob/master/PairedEndSingleSampleWf.wdl I'm not even sure whether this is the right file to look at? And I do not even know what a ".wdl" file is nor am I interested in learning yet another language. Why do I see 1500 lines of code, I was expecting just a few steps / GATK subcommand calls? Like HaplotypeCaller, filter ... ecetera ... what happened to those kind of tutorials?
Best regards
WDL is really realllllly realllllllllllllllllllllllly verbose. Most statements are used to copy variables to a process. The 'real' commands start ith
command <<<
and end with>>>
The pipeline has not changed too much since GATK3 (released 2014). The docs still exist, just not as easy to find as before.
This is not an obvious resource, but there is a lot of good info in their workshop presentations: https://drive.google.com/drive/folders/1y7q0gJ-ohNDhKG85UTRTwW1Jkq4HJ5M3
Thank you for your helpful replies. But I must say, GATK is very disappointing. Many many links lead to 404 pages and for example they seem to have renamed ApplyRecalibration to ApplyVQSR without a mention anywhere. Even on the ApplyVQSR page itself you can ctrl+f both ApplyRecalibration and ApplyVQSR. And on the best practices page for germline calling they still say that ApplyRecalibration is the "tool involved" when that tool is not even present in their tool index (I've also checked the tool index of earlier GATK4 versions). It's kind of as if the organisation of the docs was run by amateurs? I'm sorry to say that, but this is such a mess. Instead of trying to force their WDL language on everyone (which instead will lead to people using other tools I guess), they should focus on crawling their own pages for 404 errors and have a system in place that propagates changes properly throughout their docs (just search for "ApplyRecalibration" in your HTML files at least ... not that hard, find -exec grep is there for you). But thanks again to you guys for the very helpful answers/comments!
They recently moved their documentation and forums to a new platform, so a lot of the older pages got lost.
GATK tools is written by Broad Institute and Cromwell, which uses wdl files as input, is developed by Broad institute. Broad institute is pushing cromwell as workflow engine leveraging on other tools produced by Broad institute. (IMO).
You can refer to this gatk best practices variant calling workflow for RNAseq here: https://digibio.blogspot.com/2015/10/rna-seq-and-gatk-best-practices.html.