VEP- What is the best idea to start analyzing?
1
0
Entering edit mode
6.1 years ago

Hi all,

I have not worked with VEP software yet. But I need some outputs of this software. Unfortunately, I did not understand how to do the analysis by reading the guide it. So, What is the best idea to start analyzing?

Best Regard

Mostafa

SNP vep Ensembl • 4.5k views
ADD COMMENT
1
Entering edit mode

Why not install locally and try out examples?

ADD REPLY
0
Entering edit mode

many thanks for your reply,

I have installed it, But I do not know exactly what the first step is? I guess I should first annotate my VCF file using the script below? Is my guess right?

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz
ADD REPLY
0
Entering edit mode

what zx8754 said: did you only try "quick start" on the right of https://www.ensembl.org/info/docs/tools/vep/script/index.html

ADD REPLY
2
Entering edit mode
6.1 years ago
Emily 24k

The basic commands are in the documentation.

./vep --cache -i input.txt -o output.txt

Is it working when you run that with the example files that ship with the VEP?

ADD COMMENT
0
Entering edit mode

Unfortunately, no?

The error is related to cache files.

Another question, My Organism is Buffalo and there is no information in the cache folder for it? Can i use other organisms as file caches?

ADD REPLY
0
Entering edit mode

When you run the command with the example files, what is your error?

There is no buffalo genome in Ensembl, so you will need to work with your own data. But we should fix the installation before we worry about that.

ADD REPLY
0
Entering edit mode

Hi Emily,

Unfortunately, I've been involved with VEP for days. you asked me if VEP works for me correctly or not? I think i installed it correctly. Please see below:

enter image description here

Is the installation done correctly?

ADD REPLY
1
Entering edit mode

It's impossible to read what's on the console. Can you please copy-paste the text and not a screenshot of the console?

ADD REPLY
0
Entering edit mode

Yes, Sure

which: no tabix in (/opt/vep/ensembl-vep:/opth/hadoop/hadoop-2.7.3/bin:/opth/hadoop/hadoop-2.7.3/sbin:/opt/Mathematica/11.0/SystemFiles/Libraries/Linux-x86-64/:/opt/Mathematica/11.0/Executables:/opt/intel/composer_xe_2015.0.090/bin/intel64:/opt/torque/bin:/opt/torque/sbin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/torque/sbin/:/opt/torque/bin/:/opt/maui/bin/:/opt/maui/sbin/:/opt/gold/bin:/opt/torque/sbin/:/opt/torque/bin/:/opt/mireap/viennarna/share/perl5/:/opt/maui/bin/:/opt/maui/sbin/:/opt/boost/boost-installed:/opt/MATLAB/MATLAB_Production_Server/R2013a/toolbox/distcomp/bin/:/opt/cuda/bin:/home/m.rafiepour222/bin)
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#

Versions:
  ensembl              : 94.5c08d90
  ensembl-funcgen      : 94.08b0c13
  ensembl-io           : 94.8d53275
  ensembl-variation    : 94.066b102
  ensembl-vep          : 94.4

Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl

http://www.ensembl.org/info/docs/tools/vep/script/index.html

Usage:
./vep [--cache|--offline|--database] [arguments]

Basic options
=============

--help                 Display this message and quit

-i | --input_file      Input file
-o | --output_file     Output file
--force_overwrite      Force overwriting of output file
--species [species]    Species to use [default: "human"]

--everything           Shortcut switch to turn on commonly used options. See web
                       documentation for details [default: off]
--fork [num_forks]     Use forking to improve script runtime

For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html
ADD REPLY
0
Entering edit mode

The error is on the first line:

which: no tabix

Install bgzip2 and try again?

ADD REPLY
0
Entering edit mode

Ok, Is it possible for you to send me the bgzip2 installation link?

ADD REPLY
0
Entering edit mode

I'm glad to see you've solved it. These are issues where you can show (and have shown) that you've invested your effort. Remember, asking for a download link is like using the forum as Google, which is not encouraged.

ADD REPLY
0
Entering edit mode

many thanks for your guide,

As I said above, my Organism is Buffalo and there is no information in the cache folder for it in VEP. So, as regards that in VEP documents do not provide information on how to create a file cache. Now I want to know how to generate the file cache?

ADD REPLY
0
Entering edit mode

Emily is the better person to tackle that. Like she said, installation needed to be solved before the data cache could be addressed.

I'd recommend opening a new question about getting VEP to work with the Buffalo genome. That way, this thread would be able installing VEP and all the information about the new genome would belong in that thread.

Please accept Emily's answer to mark this thread as solved. Thank you!

ADD REPLY
0
Entering edit mode

You don't need to generate a cache, you can use it directly with a GFF or GTF file and a genome FASTA. If you're having trouble with that, I agree with Ram that you should open a new post, because I'm getting very confused reading through here what is done and what links to what.

ADD REPLY
0
Entering edit mode

Hi Emily, many thanks for reply, Yes, I have been involved with this challenge for days.

First, do you suggest that I use this script:

grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz

And if i did not get a result, opening a new question about getting VEP to work with the Buffalo genome ??

ADD REPLY
1
Entering edit mode

Asking about using a script is not very useful since we have no idea if your data input is in the correct format as the data.gff file above.

You are going to need to do this step by step. Do just grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' and see what you get first. Does the output look reasonable/right. Then proceed to add one step at a time.



It is indeed time to stop posting in this thread and ask a new question if you are not able to make any progress/run into new errors.


ADD REPLY
0
Entering edit mode

Ok,

First, I used the script below and created the zip file without error.

module load SAMTools-1.4.1
grep -v "#" GCA_003121395.1_ASM312139v1_genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz

And then, use tabix -p gff data.gff.gz

And then, i use:

 vep -i Final_Filter_GQ_KHUZ_MAZ_EAZ_GIL_WAZ.vcf -gff data.gff.gz -fasta GCA_003121395.1_ASM312139v1_genomic.fna

But, I encountered this error?

-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed

STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/ensembl-vep/vep:224
Date (localtime)    = Thu Oct 25 16:42:55 2018
Ensembl API version = 94
---------------------------------------------------
ADD REPLY
0
Entering edit mode

Looks like you need to install this module.

ADD REPLY
0
Entering edit mode

Hi genomax,

I have been able to fix the installation problem. i tried and i was able to run this script (vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna) Which Emily had suggested to me. with a few WARNING But no Error:

(vep) [m.rafiepour222@abrii1 ~]$ vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna
Possible precedence issue with control flow operator at /opt/anaconda2/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.

WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna27858, rna27857
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna40648
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna46030, rna46031
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna47129, rna47130
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna50084
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna54313, rna54314
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna60662
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna63492, rna63491
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna64693
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna67395, rna67394
(vep) [m.rafiepour222@abrii1 ~]$

And that's part of my output:

#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence     cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Existing_variation      Extra
CM009840.1_932_C/A      CM009840.1:932  A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1096_A/T     CM009840.1:1096 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1107_A/G     CM009840.1:1107 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1177_C/G     CM009840.1:1177 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1276_C/T     CM009840.1:1276 T       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1295_G/A     CM009840.1:1295 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1471_C/A     CM009840.1:1471 A       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER
CM009840.1_1518_A/G     CM009840.1:1518 G       -       -       -       intergenic_variant      -       -       -       -       -       -       IMPACT=MODIFIER

Did everything go well?

ADD REPLY
0
Entering edit mode

Hi genomax,

I am waiting for your response. i have another question, i want to see if VEP works correctly, how can I calculate SIFT. i think that there should be a column with the name of SIFT in my output, but as you see in the output, is not this?

ADD REPLY
0
Entering edit mode

Deleted due to reduced space

ADD REPLY

Login before adding your answer.

Traffic: 2788 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6