A friend of mine shared this paper on new tech that promises reads to VCF in ~2 hours thanks to massive MASSIVE parallelization. I find it a little bit hard to believe.
What does the community here think of it?
Paper: http://genomebiology.com/2015/16/1/6/abstract (provisional PDF available)
Churchill: an ultra-fast, deterministic, highly scalable and balanced parallelization strategy for the discovery of human genetic variation in clinical and population-scale genomics
Abstract (provisional)
While advances in genome sequencing technology make population-scale genomics a possibility, current approaches for analysis of this data rely upon parallelization strategies that have limited scalability, complex implementation and lack reproducibility. Churchill, a balanced regional parallelization strategy, overcomes these challenges, fully automating the multiple steps required to go from raw sequencing reads to variant discovery. Through implementation of novel deterministic parallelization techniques, Churchill allows computationally efficient analysis of a high-depth whole genome sample in less than two hours. The method is highly scalable, enabling full analysis of the 1000 Genomes raw sequence dataset in a week using cloud resources. http://churchill.nchri.org/.
It appears that this is not a requirement in GB:
Anyways I agree that unfortunately this paper provides almost no detailed description of the algorithm and there is a huge COI related to Genomenext.
I for one don't see anything wrong with commercialization. That is perfectly fine to pursue.
It is the willful obfuscation that, in my opinion, runs counter to the spirit of science and should render a paper inappropriate to be published as research.
I completely agree, unfortunately having commercialization in mind frequently leads to being overly cautious with source code, algorithms etc. I have some examples from my personal practice, when I had to convince people that releasing software as open-source is ok to show proof-of-principle. One can always try to make a commercial "out of the box" alternative that is highly optimized and easy to use. I think that the dilemma of how to support the superiority of some method while leaving a room for commercially viable software is quite common.
Sorry for reviving this nine-month-old thread, but I have been working with Churchill for a while now and have yet to run it to completion with sample data both on a single computer and in a cluster environment. I've tried a lot of different things but nothing has seemed to work. There is little discussion on the successful usage of Churchill across the internet and the development team at the Research Institute at Nationwide Children's Hospital in Ohio have yet to respond to my email. At this point, I would really like to attempt to reconstruct Churchill perhaps in a different language.
So basically what I'm asking: is there anyone that would be so kind as to help me white-box this project? I imagine it will involve examining what commands are executed, in what order and what the output is for all of the steps. Of course examining the output is difficult in my situation considering I cannot get Churchill to run to completion. I am most curious to about how this step is accomplished:
Thanks
As I mentioned above in my mini review I don't believe that this tool works at all. It is an example of what is wrong with bioinformatics - complete nonsense can be published and claimed to work like magic - move on try something else.
@Istvan I'm slowly realizing that... Thank you for your input. This product is absolute garbage. The team that created it should be ashamed to have their name on something of such poor quality. I'm going to attempt to switch to bcbio-nextgen. It seems to have incredible documentation and a much larger community supporting it.
Cheers