I am trying to annotate my whole exome sequencing VCF file using VEP. My goal is to later be able to add expression data from RNAseq to the VCF, which then will be used as input for pVACseq.
Prior to using the GTF in VEP, the GTF must be bgzipped and indexed with tabix as such:
grep -v "#" mm10.gtf | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > mm10.gtf.gz
tabix -p gff mm10.gtf.gz
From what I understand, the GTF file is then ready to be used as input in VEP:
./vep --input_file input.vcf --output_file output.vep.vcf --format vcf --vcf --symbol --terms SO --tsl --fasta mm10.fa --gtf mm10.gtf.gz --offline --cache --species mus_musculus --merged --dir_plugins ~/VEP_plugins --plugin Wildtype --plugin Frameshift --pick --transcript_version
When I run the above command, I get the following error:
MSG: ERROR: Cannot use the format gtf without Bio::DB::HTS::Tabix module installed
I found this issue on Ensembl's github that seemed to mirror my issue. I have Bio::DB::HTS::Tabix installed in two places. First, in the root. Second, when I installed the ensembl_vep package it came with its own Bio::DB::HTS::Tabix. My PERL5LIB was empty. I added the path for the ensembl_vep Tabix to the PERL5LIB, yet I got the same error as above. I then removed Bio::DB::HTS::Tabix from the ensembl_vep directory, but still got the same error.
Any ideas on next steps?
Did you try removing your local
Bio::DB::HTS::Tabix
and keep the ensembl version to see if that works?I am working off a server, so I'm not sure how feasible that is. Is there a way to exclude the local Tabix module in a conda environment?
I'm not sure. You may want to contact your sysadmin about this error.