using vep with custom genome
1
0
Entering edit mode
4.6 years ago
etay.rot ▴ 30

Hi, I'm trying to use VEP [https://www.ensembl.org/info/docs/tools/vep/script/vep_tutorial.html] for annotating variant with my custom genome data. Vep recognizes the genes in the right positions, however it does not predict the effect.

for example (with small data):

./vep -i ./my_freebayes.vcf --format vcf --custom file.bed.gz,tiny,bed,,, --fasta q.fa --specie tiny

output:

## ENSEMBL VARIANT EFFECT PREDICTOR v99.2
## Output produced at 2020-04-20 09:56:38
## Using API version 99, DB version ?
## ensembl-funcgen version 99.0832337
## ensembl-variation version 99.a7f8736
## ensembl version 99.d3e7d31
## ensembl-io version 99.441b05b
## Column descriptions:
## Uploaded_variation : Identifier of uploaded variant
## Location : Location of variant in standard coordinate format (chr:start or chr:start-end
)
## Allele : The variant allele used to calculate the consequence
## Gene : Stable ID of affected gene
## Feature : Stable ID of feature
## Feature_type : Type of feature - Transcript, RegulatoryFeature or MotifFeature
## Consequence : Consequence type
## cDNA_position : Relative position of base pair in cDNA sequence
## CDS_position : Relative position of base pair in coding sequence
## Protein_position : Relative position of amino acid in protein
## Amino_acids : Reference and variant amino acids
## Codons : Reference and variant codon sequence
## Existing_variation : Identifier(s) of co-located known variants
## Extra column keys:
## IMPACT : Subjective impact classification of consequence type
## DISTANCE : Shortest distance from variant to transcript
## STRAND : Strand of the feature (1/-1)
## FLAGS : Transcript quality flags
## SOURCE : Source of transcript
## tiny : file.bed.gz (overlap)
#Uploaded_variation     Location        Allele  Gene    Feature Feature_type    Consequence
        cDNA_position   CDS_position    Protein_position        Amino_acids     Codons  Exi
sting_variation Extra
q_641_T/G       q:641   G       -       -       -       intergenic_variant      -       - --       -       -       IMPACT=MODIFIER;tiny=prt2
q_2262_T/G      q:2262  G       -       -       -       intergenic_variant      -       - --       -       -       IMPACT=MODIFIER
q_7122_T/G      q:7122  G       -       -       -       intergenic_variant      -       - --       -       -       IMPACT=MODIFIER;tiny=prt5

notice that it does mention the corresponding genes (prt2, prt5), but classify the snp as 'intergenic_variant' despite that.

Input files are:

my_freebayes.vcf

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  contig
q       641     .       T       G       0.511525        .       AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=0;RPP=5
.18177;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp    GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09
q       2262    .       T       G       0.511525        .       AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=0;RPP=5
.18177;RPPR=0;RPR=1;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp    GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09
q       7122    .       T       G       0.511525        .       AB=0;ABP=0;AC=0;AF=0;AN=1;A
O=1;CIGAR=1X;DP=1;DPB=1;DPRA=0;EPP=5.18177;EPPR=0;GTI=0;LEN=1;MEANALT=1;MQM=60;MQMR=0;NS=1;
NUMALT=1;ODDS=2.07944;PAIRED=0;PAIREDR=0;PAO=0;PQA=0;PQR=0;PRO=0;QA=0;QR=0;RO=0;RPL=1;RPP=5
.18177;RPPR=0;RPR=0;RUN=1;SAF=1;SAP=5.18177;SAR=0;SRF=0;SRP=0;SRR=0;TYPE=snp    GT:GQ:DP:AD
:RO:QR:AO:QA:GL 0:9.54243:1:0,1:0:0:1:0:0,-4.34294e-09

* file.bed*

q       12      222     prt1    0       +
q       600     700     prt2    0       +
q       1000    1500    prt3    0       -
q       2000    2200    prt4    0       -
q       7000    7700    prt5    0       +

I also tried to convert my bed file into gtf or gff file, and insert "gene_biotype "protein_coding" but it didnt seem to help. does someone know why is so and how to overcome it?

thank you!

SNP snp vep • 1.9k views
ADD COMMENT
2
Entering edit mode
4.6 years ago
Ben Moore ★ 2.4k

Hi etay.rot,

VEP can use transcript annotations defined in GFF or GTF files. I suggest that you first check that your GFF file meets the format expectations: https://www.ensembl.org/info/docs/tools/vep/script/vep_cache.html#gff

Also, please check that your GFF and FASTA file correspond.

The files must be bgzipped and indexed with tabix and a FASTA file containing the genomic sequence is required in order to generate transcript models.

Your GFF or GTF file must also be sorted in chromosomal order.

Best wishes

Ben Ensembl Helpdesk

ADD COMMENT
1
Entering edit mode

it worked, thank you! I tried before working with gff , but Apparently I didn't build the file correctly. the correct gff file looks like that:

q       custom  gene    13      222     .       +       .       ID=gene1;Name=GENE1
q       custom  transcript      13      222     .       +       .       ID=transcript1;Name=GENE1-001;Parent=gene1;biotype=protein_coding
q       custom  CDS     13      222     .       +       .       ID=cds1;Name=CDS0001;Parent=transcript1
q       custom  exon    13      222     .       +       .       ID=exon1;Name=GENE1-001_1;Parent=transcript1
q       custom  gene    601     700     .       +       .       ID=gene2;Name=GENE2
q       custom  transcript      601     700     .       +       .       ID=transcript2;Name=GENE2-001;Parent=gene2;biotype=protein_coding
q       custom  CDS     601     700     .       +       .       ID=cds2;Name=CDS0002;Parent=transcript2
q       custom  exon    601     700     .       +       .       ID=exon2;Name=GENE2-001_1;Parent=transcript2
q       custom  gene    1001    1500    .       -       .       ID=gene3;Name=GENE3
q       custom  transcript      1001    1500    .       -       .       ID=transcript3;Name=GENE3-001;Parent=gene3;biotype=protein_coding
q       custom  CDS     1001    1500    .       -       .       ID=cds3;Name=CDS0003;Parent=transcript3
q       custom  exon    1001    1500    .       -       .       ID=exon3;Name=GENE3-001_1;Parent=transcript3
q       custom  gene    7001    7700    .       +       .       ID=gene5;Name=GENE5
q       custom  transcript      7001    7700    .       +       .       ID=transcript5;Name=GENE5-001;Parent=gene5;biotype=protein_coding
q       custom  CDS     7001    7700    .       +       .       ID=cds5;Name=CDS0005;Parent=transcript5
q       custom  exon    7001    7700    .       +       .       ID=exon5;Name=GENE5-001_1;Parent=transcript5
ADD REPLY

Login before adding your answer.

Traffic: 2682 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6