VEP output has no gene names
2
1
Entering edit mode
6.3 years ago
Gene_MMP8 ▴ 240

I am trying to annotate a variant file(generated using strelka) from mice WGS data. This is the command I used:

./vep -i /path/to/somatic.snvs.vcf \
        --cache /data/shayantan/mus_musculus/ \
        --species mus_musculus

The output variant file has no gene names. Why is this happening? Something wrong with my cache files?

EDIT (@Ram): Sample input VCF:

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
alignment sequencing vep • 5.0k views
ADD COMMENT
0
Entering edit mode

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

You're using vep. A vep tag would help your cause.

ADD REPLY
0
Entering edit mode

Thanks for editing my code. I will surely keep this in mind for future posts.

ADD REPLY
0
Entering edit mode

I think you need to add the option --symbol to the command.

ADD REPLY
0
Entering edit mode

Thanks. But even after including --symbol, I am getting no gene names

ADD REPLY
0
Entering edit mode

instead of cache, can you run the code with db option for few selected variants? @ banerjeeshayantan

ADD REPLY
0
Entering edit mode

Please can you show us a sample of your input file.

ADD REPLY
0
Entering edit mode
#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  NORMAL  TUMOR
chr1    3003110 .   G   T   .   LowEVS  SOMATIC;QSS=17;TQSS=1;NT=ref;QSS_NT=17;TQSS_NT=1;SGT=GG->GT;DP=35;MQ=60.00;MQ0=0;ReadPosRankSum=1.95;SNVSB=3.58;SomaticEVS=0.80 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:0,0:17,17:1,1  14:0:0:0:0,0:0,0:11,13:3,4
chr1    3035137 .   G   A   .   LowEVS  SOMATIC;QSS=17;TQSS=2;NT=ref;QSS_NT=17;TQSS_NT=2;SGT=GG->AG;DP=70;MQ=40.40;MQ0=10;ReadPosRankSum=0.89;SNVSB=3.23;SomaticEVS=0.10    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    27:0:0:0:3,7:0,0:24,28:0,0  27:0:0:0:4,6:0,0:23,29:0,0
chr1    3035168 .   C   T   .   LowEVS  SOMATIC;QSS=8;TQSS=2;NT=ref;QSS_NT=8;TQSS_NT=2;SGT=CC->CT;DP=51;MQ=47.72;MQ0=3;ReadPosRankSum=1.78;SNVSB=2.68;SomaticEVS=0.08   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    18:0:0:0:0,0:16,19:0,0:2,4  23:0:0:0:0,0:20,25:0,0:3,3
chr1    3035504 .   C   A   .   LowEVS  SOMATIC;QSS=15;TQSS=2;NT=ref;QSS_NT=14;TQSS_NT=2;SGT=CC->AC;DP=59;MQ=51.03;MQ0=2;ReadPosRankSum=-1.19;SNVSB=2.71;SomaticEVS=0.09    DP:FDP:SDP:SUBDP:AU:CU:GU:TU    23:0:0:0:3,5:20,22:0,0:0,0  27:0:0:0:4,4:23,28:0,0:0,0
chr1    3043000 .   G   T   .   LowEVS  SOMATIC;QSS=21;TQSS=1;NT=ref;QSS_NT=21;TQSS_NT=1;SGT=GG->GT;DP=53;MQ=46.60;MQ0=7;ReadPosRankSum=1.70;SNVSB=1.37;SomaticEVS=0.20 DP:FDP:SDP:SUBDP:AU:CU:GU:TU    20:0:0:0:0,0:0,0:18,24:2,3  22:0:0:0:0,0:0,0:18,22:4,4
ADD REPLY
0
Entering edit mode

Hey! You said you'd keep the editing tip in mind for future posts. Use the code formatting to your advantage, man :-)

ADD REPLY
0
Entering edit mode

This is so embarrassing. I was in a hurry and so couldn't format it. I will surely follow the site's guidelines form the next post.

ADD REPLY
5
Entering edit mode
6.3 years ago
Emily 24k

Those variants are all intergenic. There is no gene symbol because no genes are hit.

EDIT (@genomax) - Actual answer is further below in this chain at C: VEP output has no gene names

ADD COMMENT
0
Entering edit mode
chr1    3930912 .   G   A   .   LowEVS  SOMATIC;QSS=2;TQSS=1;NT=ref;QSS_NT=2;TQSS_NT=1;SGT=GG->GG;DP=26;MQ=58.54;MQ0=0;ReadPosRankSum=-1.85;SNVSB=1.53;SomaticEVS=1.26;ANN=A|intergenic_region|MODIFIER|Xkr4-Rp1|Xkr4-Rp1|intergenic_region|Xkr4-Rp1|||n.3930912G>A||||||   DP:FDP:SDP:SUBDP:AU:CU:GU:TU    16:0:0:0:0,0:0,0:16,16:0,0  9:3:0:0:2,5:0,0:4,5:0,0

Here in spite of being an intergenic variant, it has a gene name. I am confused.

ADD REPLY
1
Entering edit mode

Are you using an up-to-date version of the VEP? What is your reference genome? That variant is coming up as intronic to a lincRNA for me.

ADD REPLY
0
Entering edit mode

My reference genome is mm9. I am using an older reference as the bam files were aligned and the variants were called using this reference genome only. The vep version is ensemble-vep 93.2

ADD REPLY
0
Entering edit mode

Are you using the Ensembl release 67 cache files? Or your own custom cache? What was in the input line for that variant?

ADD REPLY
0
Entering edit mode

I used the cache file from this page under the mouse genome and column known as "variation vep". This resulted in opening of a page from where I downloaded the mus_musculus_vep_93_GRCm38.tar.gz file.Did I do anything wrong?

ADD REPLY
2
Entering edit mode

That cache is GRCm38 (mm10). You're using NCBI37 (mm9). Of course it doesn't work.

ADD REPLY
0
Entering edit mode

Thanks for pointing it out. If possible, Can you please direct me to the appropriate link?

ADD REPLY
2
Entering edit mode

This is not the easiest problem. You will need to use the NCBIm37 cache from Ensembl 67, which will probably not work with the current VEP, and will work best with VEP 67.

VEP 67

NCBIm37 cache

Your alternative would be to run your VCF files through the Ensembl Assembly Converter to get them onto GRCm38, but be aware that you may lose some data this way.

ADD REPLY
0
Entering edit mode

Thanks for suggesting the steps. I will try it out. This really helped! Thanks again.

ADD REPLY
0
Entering edit mode

Please accept Emily's answer.

ADD REPLY
0
Entering edit mode

Hi again. The VEP 67 version link is down. Are you aware of any active links?

ADD REPLY
0
Entering edit mode

Not down for me - I can access the link.

ADD REPLY
0
Entering edit mode

Hello Emily! I am working with the cattle genome. How can I add gene name in vep output?

ADD REPLY
2
Entering edit mode

Use --symbol with the offline VEP. Online, tick Gene symbol (should be selected by default). With the REST API it should be enabled by default.

ADD REPLY
0
Entering edit mode
18 months ago
LayneSadler ▴ 90

Include the --symbol argument

Adds the gene symbol (e.g. HGNC) (where available) to the output. Some gene symbol, e.g. HGNC, are only available in merged cache and therefore should be used with --merged option while using cache to get result. Not used by default

http://useast.ensembl.org/info/docs/tools/vep/script/vep_options.html#opt_symbol

ADD COMMENT

Login before adding your answer.

Traffic: 1853 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6