Question

VEP- What is the best idea to start analyzing?

0

Entering edit mode

6.1 years ago

mostafarafiepour ▴ 180

Hi all,

I have not worked with VEP software yet. But I need some outputs of this software. Unfortunately, I did not understand how to do the analysis by reading the guide it. So, What is the best idea to start analyzing?

Best Regard

Mostafa

SNP vep Ensembl • 4.4k views

ADD COMMENT • link updated 6.1 years ago by sadri.amirhossein ▴ 10 • written 6.1 years ago by mostafarafiepour ▴ 180

1

Entering edit mode

Why not install locally and try out examples?

ADD REPLY • link 6.1 years ago by zx8754 12k

0

Entering edit mode

many thanks for your reply,

I have installed it, But I do not know exactly what the first step is? I guess I should first annotate my VCF file using the script below? Is my guess right?

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

what zx8754 said: did you only try "quick start" on the right of https://www.ensembl.org/info/docs/tools/vep/script/index.html

ADD REPLY • link 6.1 years ago by Pierre Lindenbaum 164k

Ram · Answer 1 · 2018-10-15

2

Entering edit mode

6.1 years ago

Emily 24k

The basic commands are in the documentation.

./vep --cache -i input.txt -o output.txt

Is it working when you run that with the example files that ship with the VEP?

ADD COMMENT • link 6.1 years ago by Emily 24k

0

Entering edit mode

Unfortunately, no?

The error is related to cache files.

Another question, My Organism is Buffalo and there is no information in the cache folder for it? Can i use other organisms as file caches?

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

When you run the command with the example files, what is your error?

There is no buffalo genome in Ensembl, so you will need to work with your own data. But we should fix the installation before we worry about that.

ADD REPLY • link 6.1 years ago by Emily 24k

0

Entering edit mode

Hi Emily,

Unfortunately, I've been involved with VEP for days. you asked me if VEP works for me correctly or not? I think i installed it correctly. Please see below:

enter image description here

Is the installation done correctly?

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

1

Entering edit mode

It's impossible to read what's on the console. Can you please copy-paste the text and not a screenshot of the console?

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

Yes, Sure

which: no tabix in (/opt/vep/ensembl-vep:/opth/hadoop/hadoop-2.7.3/bin:/opth/hadoop/hadoop-2.7.3/sbin:/opt/Mathematica/11.0/SystemFiles/Libraries/Linux-x86-64/:/opt/Mathematica/11.0/Executables:/opt/intel/composer_xe_2015.0.090/bin/intel64:/opt/torque/bin:/opt/torque/sbin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/torque/sbin/:/opt/torque/bin/:/opt/maui/bin/:/opt/maui/sbin/:/opt/gold/bin:/opt/torque/sbin/:/opt/torque/bin/:/opt/mireap/viennarna/share/perl5/:/opt/maui/bin/:/opt/maui/sbin/:/opt/boost/boost-installed:/opt/MATLAB/MATLAB_Production_Server/R2013a/toolbox/distcomp/bin/:/opt/cuda/bin:/home/m.rafiepour222/bin)
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 94.5c08d90
  ensembl-funcgen      : 94.08b0c13
  ensembl-io           : 94.8d53275
  ensembl-variation    : 94.066b102
  ensembl-vep          : 94.4

Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl

http://www.ensembl.org/info/docs/tools/vep/script/index.html

Usage:
./vep [--cache|--offline|--database] [arguments]

Basic options
=============

--help                 Display this message and quit

-i | --input_file      Input file
-o | --output_file     Output file
--force_overwrite      Force overwriting of output file
--species [species]    Species to use [default: "human"]

--everything           Shortcut switch to turn on commonly used options. See web
                       documentation for details [default: off]
--fork [num_forks]     Use forking to improve script runtime

For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

The error is on the first line:

which: no tabix

Install bgzip2 and try again?

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

Ok, Is it possible for you to send me the bgzip2 installation link?

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

I'm glad to see you've solved it. These are issues where you can show (and have shown) that you've invested your effort. Remember, asking for a download link is like using the forum as Google, which is not encouraged.

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

many thanks for your guide,

As I said above, my Organism is Buffalo and there is no information in the cache folder for it in VEP. So, as regards that in VEP documents do not provide information on how to create a file cache. Now I want to know how to generate the file cache?

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Emily is the better person to tackle that. Like she said, installation needed to be solved before the data cache could be addressed.

I'd recommend opening a new question about getting VEP to work with the Buffalo genome. That way, this thread would be able installing VEP and all the information about the new genome would belong in that thread.

Please accept Emily's answer to mark this thread as solved. Thank you!

ADD REPLY • link 6.1 years ago by Ram 44k

0

Entering edit mode

You don't need to generate a cache, you can use it directly with a GFF or GTF file and a genome FASTA. If you're having trouble with that, I agree with Ram that you should open a new post, because I'm getting very confused reading through here what is done and what links to what.

ADD REPLY • link 6.1 years ago by Emily 24k

0

Entering edit mode

Hi Emily, many thanks for reply, Yes, I have been involved with this challenge for days.

First, do you suggest that I use this script:

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz

And if i did not get a result, opening a new question about getting VEP to work with the Buffalo genome ??

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

1

Entering edit mode

Asking about using a script is not very useful since we have no idea if your data input is in the correct format as the data.gff file above.

You are going to need to do this step by step. Do just grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' and see what you get first. Does the output look reasonable/right. Then proceed to add one step at a time.

It is indeed time to stop posting in this thread and ask a new question if you are not able to make any progress/run into new errors.

ADD REPLY • link updated 6.1 years ago by Ram 44k • written 6.1 years ago by GenoMax 147k

0

Entering edit mode

Ok,

First, I used the script below and created the zip file without error.

module load SAMTools-1.4.1
grep -v "#" GCA_003121395.1_ASM312139v1_genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz

And then, use tabix -p gff data.gff.gz

And then, i use:

 vep -i Final_Filter_GQ_KHUZ_MAZ_EAZ_GIL_WAZ.vcf -gff data.gff.gz -fasta GCA_003121395.1_ASM312139v1_genomic.fna

But, I encountered this error?

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/ensembl-vep/vep:224
Date (localtime)    = Thu Oct 25 16:42:55 2018
Ensembl API version = 94
---------------------------------------------------

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Looks like you need to install this module.

ADD REPLY • link 6.1 years ago by GenoMax 147k

0

Entering edit mode

Hi genomax,

I have been able to fix the installation problem. i tried and i was able to run this script (vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna) Which Emily had suggested to me. with a few WARNING But no Error:

(vep) [m.rafiepour222@abrii1 ~]$ vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna
Possible precedence issue with control flow operator at /opt/anaconda2/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.

WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna27858, rna27857
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna40648
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna46030, rna46031
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna47129, rna47130
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna50084
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna54313, rna54314
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna60662
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna63492, rna63491
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna64693
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna67395, rna67394
(vep) [m.rafiepour222@abrii1 ~]$

And that's part of my output:

#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation      Extra
CM009840.1_932_C/A      CM009840.1:932  A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1096_A/T     CM009840.1:1096 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1107_A/G     CM009840.1:1107 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1177_C/G     CM009840.1:1177 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1276_C/T     CM009840.1:1276 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1295_G/A     CM009840.1:1295 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1471_C/A     CM009840.1:1471 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1518_A/G     CM009840.1:1518 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER

Did everything go well?

ADD REPLY • link 6.1 years ago by mostafarafiepour ▴ 180

0

Entering edit mode

Hi genomax,

I am waiting for your response. i have another question, i want to see if VEP works correctly, how can I calculate SIFT. i think that there should be a column with the name of SIFT in my output, but as you see in the output, is not this?