Some thoughts:
1) DeepVariant does not work on somatic calls - only germline.
2) Yes, it beat GATK, but only barely (don't quote me on the numbers, but it was something like 98% vs 98.5%)
3) The method is insane, in that they actually create millions of images, encoding read information as colors and alpha, and then use their image-processing neural network to do pattern recognition for calling.
4) it is quite computationally expensive for running, not even to mention training the NN.
5) It absolutely requires new training data for each platform that you're going to run it on. Chemistry changed slightly? Got a new type of instrument? Doing targeted regions instead of WGS? You'll need a new gold standard run and you'll need to retrain the algorithm from scratch. They used the Genome in a Bottle dataset. That's limited to ~80% of the genome, and their TPs are only calls validated on at least two sequencing technologies.
Don't get me wrong - it's cool to see someone enter the space with a really crazy orthogonal method, but it's not a panacea, and the hype about AI solving all of our variant calling problems is pretty clearly overblown. That doesn't mean that this won't be useful in the future, just that it's not there yet.
One (more) step towards "Ok Google .. analyze this dataset, predict the downstream consequences".
haha, sounds familiar but I was expecting this from Google, finally its out and to be honest seems pretty impressive with Open Source availability as well.
We implemented the DeepVariant pipeline with Docker and Nextflow here
Lifebit integrate it the pipeline with example parameters
Would love your feedback on this.
Thanks!