Hi, I'm trying to use VEP [https://www.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html] for annotating variant with my custom genome data. Vep recognizes the genes in the right positions, however it does not predict the effect.
for example (with small data):
./vep -i ./my_freebayes.vcf --format vcf --custom file.bed.gz,tiny,bed,,, --fasta q.fa --specie tiny
output:
## ENSEMBL VARIANT EFFECT PREDICTOR v99.2
## Output produced at 2020-04-20 09:56:38
## Using API version 99, DB version ?
## ensembl-funcgen version 99.0832337
## ensembl-variation version 99.a7f8736
## ensembl version 99.d3e7d31
## ensembl-io version 99.441b05b
## Column descriptions:
## Uploaded_variation : Identifier of uploaded variant
## Location : Location of variant in standard coordinate format (chr:start or chr:start-end
)
## Allele : The variant allele used to calculate the consequence
## Gene : Stable ID of affected gene
## Feature : Stable ID of feature
## Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature
## Consequence : Consequence type
## cDNA_position : Relative position of base pair in cDNA sequence
## CDS_position : Relative position of base pair in coding sequence
## Protein_position : Relative position of amino acid in protein
## Amino_acids : Reference and variant amino acids
## Codons : Reference and variant codon sequence
## Existing_variation : Identifier(s) of co-located known variants
## Extra column keys:
## IMPACT : Subjective impact classification of consequence type
## DISTANCE : Shortest distance from variant to transcript
## STRAND : Strand of the feature (1/-1)
## FLAGS : Transcript quality flags
## SOURCE : Source of transcript
## tiny : file.bed.gz (overlap)
#Uploaded_variation Location Allele Gene Feature Feature_type Consequence
cDNA_position CDS_position Protein_position Amino_acids Codons Exi
sting_variation Extra
q_641_T/G q:641 G - - - intergenic_variant - - -- - - IMPACT=MODIFIER;tiny=prt2
q_2262_T/G q:2262 G - - - intergenic_variant - - -- - - IMPACT=MODIFIER
q_7122_T/G q:7122 G - - - intergenic_variant - - -- - - IMPACT=MODIFIER;tiny=prt5
notice that it does mention the corresponding genes (prt2, prt5), but classify the snp as 'intergenic_variant' despite that.
Input files are:
my_freebayes.vcf
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT contig
q 641 . T G 0.511525 . AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=0;RPP=5
.18177;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09
q 2262 . T G 0.511525 . AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=0;RPP=5
.18177;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09
q 7122 . T G 0.511525 . AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=1;RPP=5
.18177;RPPR=0;RPR=0;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09
* file.bed*
q 12 222 prt1 0 +
q 600 700 prt2 0 +
q 1000 1500 prt3 0 -
q 2000 2200 prt4 0 -
q 7000 7700 prt5 0 +
I also tried to convert my bed file into gtf or gff file, and insert "gene_biotype "protein_coding" but it didnt seem to help. does someone know why is so and how to overcome it?
thank you!
it worked, thank you! I tried before working with gff , but Apparently I didn't build the file correctly. the correct gff file looks like that: