I am running luigi
pipeline and in it VEP
fails with the following completely non informative to me output:
Java stack trace:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 474 in stage 7.0 failed 4 times, most recent failure: Lost task 474.3 in stage 7.0 (TID 3007, 137.187.60.63, executor 1): is.hail.utils.HailException: VEP command '/vep/ensembl-tools-release-95/vep --format vcf --json --everything --allele_number --no_stats --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/95_GRCh38/hg38.fa --plugin LoF,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT' failed with non-zero exit status 2
VEP Error output:
at is.hail.utils.ErrorHandling$class.fatal(ErrorHandling.scala:9)
at is.hail.utils.package$.fatal(package.scala:74)
at is.hail.methods.VEP$.waitFor(VEP.scala:76)
at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:214)
at is.hail.methods.VEP$$anonfun$7$$anonfun$apply$4.apply(VEP.scala:157)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:435)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:441)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:409)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at is.hail.io.RichContextRDDRegionValue$$anonfun$boundary$extension$1$$anon$1.hasNext(RichContextRDDRegionValue.scala:185)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
at scala.collection.Iterator$$anon$18.hasNext(Iterator.scala:762)
at scala.collection.Iterator$$anon$16.hasNext(Iterator.scala:598)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:439)
...
...
I was able to debug VEP
up to a current point by just running:
/vep/ensembl-tools-release-95/vep --format vcf --json --everything --allele_number --no_stats --offline --minimal --verbose --assembly GRCh38 --dir_cache /vep/vep_cache --fasta /vep/homo_sapiens/95_GRCh38/hg38.fa --plugin LoF,loftee_path:/vep/loftee_grch38,gerp_bigwig:/vep/loftee_data_grch38/gerp_conservation_scores.homo_sapiens.GRCh38.bw,human_ancestor_fa:/vep/loftee_data_grch38/human_ancestor.fa.gz,filter_position:0.05,min_intron_size:15,conservation_file:/vep/loftee_data_grch38/loftee.sql,run_splice_predictions:0 --dir_plugins /vep/loftee_grch38 -o STDOUT
And seeing the output but what happens now is that this command just gets stuck and never returns, maybe because of long computations, not sure. I tried to include --verbose
in the command as can be seen, but in luigi
the output is still the same. I tried also looking into spark
output work
folder stdout
and stderr
but these files have no VEP
error output.
The luigi
command that I use to run the pipeline is the following:
LUIGI_CONFIG_PATH=luigi_pipeline/configs/GRCh38.cfg nohup python -u submit.py --cpu-limit 4 --num-executors 3 --driver-memory 2g --executor-memory=4g --hail-version 0.2 --run-locally luigi_pipeline/seqr_loading.py SeqrMTToESTask --local-scheduler --spark-home $SPARK_HOME --project-guid batch1 &
It is running on top of spark
. Is there a way of either to see VEP
error output or just run it separately correctly, so that I would be able to quickly see the output?
If I test VEP it seems to be working just fine:
./vep -i examples/homo_sapiens_GRCh38.vcf --cache --dir_cache /vep/vep_cache
Generating the output files that look right