I am trying to run loftee as a part of VEP with the following command:
/vep/ensembl-tools-release-95/vep -i SF0637658_WES_CIDR.vcf --format vcf --json --everything --allele_number --no_stats --cache --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/95_GRCh38/hg38.fa --plugin LoF,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/phylocsf_gerp.sql,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o vep_output
The error that I am getting seems to be all about conservation_file:
DBD::SQLite::db prepare failed: no such table: phylocsf_summary at /var/lib/spark/vep/loftee_grch38/LoF.pm line 553, <anonio> line 15156.
I tried supplying as conservation file phylocsf_gerp.sql.gz, phylocsf_gerp.sql, loftee.sql but none really works.
How could it be fixed?
I have also installed PhyloCSF and placed it on the PATH. Not sure whether it is needed. The version of VEP that I am using is 99. I checked out the correct grch38 loftee branch, and not just master. My OS is the following:
(py37) -bash-4.2$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.7 (Maipo)
Release: 7.7
Codename: Maipo
I suppose it needs access to some database. Can you verify whether the script has access to the DB server. If that's the case, check if the tables can be created (otherwise it might help to already "initialise" the process by creating empty tables )
The thing is that this database (from my understanding) is provided by the loftee team:
https://github.com/konradjk/loftee
Look for conservation_file in grch38 branch.
Here is this link: https://personal.broadinstitute.org/konradk/loftee_data/GRCh38/loftee.sql.gz
I do not know what other database I should have added for grch38. And the thing is that if I feed gzipped loftee.sql file I am getting an error:
So, it is definitely this loftee.sql database problem.
what does it say at the top of those sql file(s) ? (can you post the first 10 or such commands that are present in thos files)?
It created phylocsf_summary table at the top by CREATE TABLE command. I solved it by reordering arguments to --plugin LoF, ... which is very disappointing because it should not be this way: depending on order of arguments + giving completely uninformative error. I could have spent weeks on it due to that. Horrible. My Hail pipeline still fails for an unknown reason but VEP runs in standalone mode at least.