I have been using VEP (variant effect predictor) from Ensembl for annotating VCFs produced by GATK's haplotype caller and PINDEL. The VEP is failing for some of the VCFs with the following error:
> -------------------- EXCEPTION --------------------
MSG:
ERROR: Forked process(es) died: read-through of cross-process communication detected
>STACK Bio::EnsEMBL::VEP::Runner::_forked_buffer_to_output vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:554
STACK Bio::EnsEMBL::VEP::Runner::next_output_line vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:360
STACK Bio::EnsEMBL::VEP::Runner::run vep/version95/modules/Bio/EnsEMBL/VEP/Runner.pm:202
STACK toplevel vep/version95/vep:225
Date (localtime) = Thu May 9 13:25:54 2019
Ensembl API version = 95
---------------------------------------------------
It took me weeks to rectify the actual cause of this error as I was not able to find the solution on forums. I have tried adjusting the --buffer and --forks parameters as suggested on several forums but no success. It turns out to be an issue of REF and ALT alleles size for some variant. When I excluded the records with ALT/REF alleles' length more than 1000, I have got the results without any error.
VEP offline command used is:
vep --buffer_size 1000 --offline -i dataset_22336.dat -o dataset_22337.dat --cache --dir vep/database/ --force_overwrite --merged --cache_version 95 --assembly GRCh38 --fasta Homo_sapiens.GRCh38.dna.primary_assembly.fa --fork 32 --everything --vcf
What could be a possible solution to run VEP on the records with ALT/REF alleles' length in 0.5 to 2 million? Any help would be much appreciated.
Thanks in advance. Tagging @ Emily_Ensembl
do you really want to annotate a variant with this length ?
Pierre Lindenbaum, Could you please suggest what would be the optimal length to go with and exclude insignificant variants.
Thanks.