Hi all,
I have not worked with VEP software yet. But I need some outputs of this software. Unfortunately, I did not understand how to do the analysis by reading the guide it. So, What is the best idea to start analyzing?
Best Regard
Mostafa
Hi all,
I have not worked with VEP software yet. But I need some outputs of this software. Unfortunately, I did not understand how to do the analysis by reading the guide it. So, What is the best idea to start analyzing?
Best Regard
Mostafa
The basic commands are in the documentation.
./vep --cache -i input.txt -o output.txt
Is it working when you run that with the example files that ship with the VEP?
Yes, Sure
which: no tabix in (/opt/vep/ensembl-vep:/opth/hadoop/hadoop-2.7.3/bin:/opth/hadoop/hadoop-2.7.3/sbin:/opt/Mathematica/11.0/SystemFiles/Libraries/Linux-x86-64/:/opt/Mathematica/11.0/Executables:/opt/intel/composer_xe_2015.0.090/bin/intel64:/opt/torque/bin:/opt/torque/sbin:/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/opt/torque/sbin/:/opt/torque/bin/:/opt/maui/bin/:/opt/maui/sbin/:/opt/gold/bin:/opt/torque/sbin/:/opt/torque/bin/:/opt/mireap/viennarna/share/perl5/:/opt/maui/bin/:/opt/maui/sbin/:/opt/boost/boost-installed:/opt/MATLAB/MATLAB_Production_Server/R2013a/toolbox/distcomp/bin/:/opt/cuda/bin:/home/m.rafiepour222/bin)
#----------------------------------#
# ENSEMBL VARIANT EFFECT PREDICTOR #
#----------------------------------#
Versions:
ensembl : 94.5c08d90
ensembl-funcgen : 94.08b0c13
ensembl-io : 94.8d53275
ensembl-variation : 94.066b102
ensembl-vep : 94.4
Help: dev@ensembl.org , helpdesk@ensembl.org
Twitter: @ensembl
http://www.ensembl.org/info/docs/tools/vep/script/index.html
Usage:
./vep [--cache|--offline|--database] [arguments]
Basic options
=============
--help Display this message and quit
-i | --input_file Input file
-o | --output_file Output file
--force_overwrite Force overwriting of output file
--species [species] Species to use [default: "human"]
--everything Shortcut switch to turn on commonly used options. See web
documentation for details [default: off]
--fork [num_forks] Use forking to improve script runtime
For full option documentation see:
http://www.ensembl.org/info/docs/tools/vep/script/vep_options.html
many thanks for your guide,
As I said above, my Organism is Buffalo and there is no information in the cache folder for it in VEP. So, as regards that in VEP documents do not provide information on how to create a file cache. Now I want to know how to generate the file cache?
Emily is the better person to tackle that. Like she said, installation needed to be solved before the data cache could be addressed.
I'd recommend opening a new question about getting VEP to work with the Buffalo genome. That way, this thread would be able installing VEP and all the information about the new genome would belong in that thread.
Please accept Emily's answer to mark this thread as solved. Thank you!
You don't need to generate a cache, you can use it directly with a GFF or GTF file and a genome FASTA. If you're having trouble with that, I agree with Ram that you should open a new post, because I'm getting very confused reading through here what is done and what links to what.
Hi Emily, many thanks for reply, Yes, I have been involved with this challenge for days.
First, do you suggest that I use this script:
grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
tabix -p gff data.gff.gz
./vep -i input.vcf -gff data.gff.gz -fasta genome.fa.gz
And if i did not get a result, opening a new question about getting VEP to work with the Buffalo genome ??
Asking about using a script is not very useful since we have no idea if your data input is in the correct format as the data.gff
file above.
You are going to need to do this step by step. Do just grep -v "#" data.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t'
and see what you get first. Does the output look reasonable/right. Then proceed to add one step at a time.
It is indeed time to stop posting in this thread and ask a new question if you are not able to make any progress/run into new errors.
Ok,
First, I used the script below and created the zip file without error.
module load SAMTools-1.4.1
grep -v "#" GCA_003121395.1_ASM312139v1_genomic.gff | sort -k1,1 -k4,4n -k5,5n -t$'\t' | bgzip -c > data.gff.gz
And then, use tabix -p gff data.gff.gz
And then, i use:
vep -i Final_Filter_GQ_KHUZ_MAZ_EAZ_GIL_WAZ.vcf -gff data.gff.gz -fasta GCA_003121395.1_ASM312139v1_genomic.fna
But, I encountered this error?
-------------------- EXCEPTION --------------------
MSG: ERROR: Cannot use format gff without Bio::DB::HTS::Tabix module installed
STACK Bio::EnsEMBL::VEP::AnnotationSource::File::new /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSource/File.pm:162
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all_custom /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:228
STACK Bio::EnsEMBL::VEP::AnnotationSourceAdaptor::get_all /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/AnnotationSourceAdaptor.pm:93
STACK Bio::EnsEMBL::VEP::BaseRunner::get_all_AnnotationSources /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/BaseRunner.pm:175
STACK Bio::EnsEMBL::VEP::Runner::init /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:123
STACK Bio::EnsEMBL::VEP::Runner::run /opt/vep/ensembl-vep/modules/Bio/EnsEMBL/VEP/Runner.pm:194
STACK toplevel /opt/vep/ensembl-vep/vep:224
Date (localtime) = Thu Oct 25 16:42:55 2018
Ensembl API version = 94
---------------------------------------------------
Hi genomax,
I have been able to fix the installation problem. i tried and i was able to run this script (vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna) Which Emily had suggested to me. with a few WARNING But no Error:
(vep) [m.rafiepour222@abrii1 ~]$ vep -i Final.vcf -gff data.gff.gz -fasta genomic.fna
Possible precedence issue with control flow operator at /opt/anaconda2/lib/site_perl/5.26.2/Bio/DB/IndexedBase.pm line 845.
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna27858, rna27857
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna40648
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna46030, rna46031
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna47129, rna47130
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna50084
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna54313, rna54314
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna60662
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna63492, rna63491
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna64693
WARNING: Parent entries with the following IDs were not found or skipped due to invalid types: rna67395, rna67394
(vep) [m.rafiepour222@abrii1 ~]$
And that's part of my output:
#Uploaded_variation Location Allele Gene Feature Feature_type Consequence cDNA_position CDS_position Protein_position Amino_acids Codons Existing_variation Extra
CM009840.1_932_C/A CM009840.1:932 A - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1096_A/T CM009840.1:1096 T - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1107_A/G CM009840.1:1107 G - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1177_C/G CM009840.1:1177 G - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1276_C/T CM009840.1:1276 T - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1295_G/A CM009840.1:1295 A - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1471_C/A CM009840.1:1471 A - - - intergenic_variant - - - - - - IMPACT=MODIFIER
CM009840.1_1518_A/G CM009840.1:1518 G - - - intergenic_variant - - - - - - IMPACT=MODIFIER
Did everything go well?
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Why not install locally and try out examples?
many thanks for your reply,
I have installed it, But I do not know exactly what the first step is? I guess I should first annotate my VCF file using the script below? Is my guess right?
what zx8754 said: did you only try "quick start" on the right of https://www.ensembl.org/info/docs/tools/vep/script/index.html