I installed Deepvariant 1.6.0 on a:
Dell poweredge R910, 128 Gb of RAM, 36 cores. The is machine has windows 10 LTSC installed on it. However, also installed on it is: Virtual box 6.0 Ubuntu 20.04 LTS Controller ID: Vboxguestaddtions_6.06
Deepvariant was installed on ubuntu 20.04 LTS using docker:
https://github.com/google/deepvariant
sudo apt -y update
sudo apt-get -y install docker.io
sudo docker pull google/deepvariant:1.6.0
However, when I run deepvariant, I keep getting the following message:
**The TensorFlow library was complied to use AVX instructions, but these aren't available on your machine.
/opt/deepvariant/bin/run_deepvariant: line 2: 7 Aborted (core dumped) phython3 -u /opt/deepvariant/bin/run_deepvariant.py "$@"**
How do I fix this problem ?
It also keeps saying that it cannot find the genome reference fasta file, even though I have specified the correct file directory to genome reference file. Does one have to first prepare the genome reference fasta file to work with deepvariant f? For instance, Dragmap, Isaac aligner, etc require the reference Fasta file to be first prepared for use with these tools before they are used.
See if you need to enable AVX instructions (or fix your virtualbox install): https://stackoverflow.com/questions/65780506/how-to-enable-avx-avx2-in-virtualbox-6-1-16-with-ubuntu-20-04-64bit
I was finally able to get Deepvariant to work on a Dell poweredge R820 that has CPUs with AVX support. My only issue is Deepvariant version 1.6.0 took 3 days (72 hours) to process a human exome BAM file in order to create VCF and gVCF files. How can I accelerate the time it takes for Deepvariant to process data ?
Can I for instance, request it not to output a gVCF file ?
Sounds like it's slow. There are faster programs; is there a reason you want to use Deepvariant?
I read several recent articles published on the bench-marking of several variant callers, and they all seem to claim that deepvariant was the most accurate of all of them; hence the reason why I am exploring to switch to deepvariant.
I have used strelka 2.0, and I am very impressed with it. In fact with the same human exome BAM file, it outputs gVCF and VCF in about 35 minutes on the same server; while with deepvariant it took 3 days for the same file.
Have you tried BBTools' CallVariants? I'd be very surprised if there was a faster variant-caller. I've never tested Strelka, but it is much more accurate than GATK or FreeBayes.