Entering edit mode
18 months ago
Jeremy Leipzig
22k
docker run \
-u 1000:1000 \
-v $HOME/vep_data:/data \
-v $HOME/vep_data:/.vep \
-v $PWD:/opt/foo ensemblorg/ensembl-vep \
vep \
-i /opt/foo/{input} \
-o /opt/foo/{output} \
--cache \
--force_overwrite \
--fork {threads} \
--format vcf \
--buffer_size 5000 \
--terms SO \
--symbol \
--ccds \
--variant_class \
--hgvs \
--hgvsg \
--force \
--dont_skip \
--no_stats \
--pick_allele \
--vcf \
--show_ref_allele
no HGVSg:
chr22 10510044 . G T . . CSQ=T|intergenic_variant|MODIFIER||||||||||||||||G||||SNV|||||| GT 0/1
2 base pairs later, HGVSg:
chr22 10510046 . A T . . CSQ=T|intergenic_variant|MODIFIER||||||||||||||||A||||SNV|||||chr22:g.10510046G>T| GT 0/1
3 quick questions:
1) are you working with a single sample or multisample VCF?
2) what genomic build? (ill assume GRCh38.p14)
3) do you have strand information?
Pending that, best guess:
T is reference for GRCh38 - GRCh38.p14 for 22:10510044, while G is reference for 22:10510046. Therefore it may simply owe to designation as a variant or not... no matter the strand considered for 22:100510046, it has to be a variant (viz. A/T; T/A on flip strand); versus 22:10510044 (G/T; C/A) could ostensibly be a variant or could be reference.
If you provide 15 of each (vars with and without anno, could confirm/disconfirm this guess with high confidence)
i don't think it has to do with presence/absence in a given database, after cursory cross-checks. if none of this pans out, would probably ask Emily Ensembl
FYI: Emily no longer works for/at Ensembl. That is the reason she has taken out
_ensembl
from her profile.Ben_Ensembl may perhaps have some input.
GenoMax thank you for clarifying that - end of an era!
you're right it's an off-by-one-error on my part. thanks
I have a couple thoughts on this - first, though, would like to confirm that you are on GRCh38?
yes this should be a reproducible example using just those two lines and the latest vep and database on GRCh38
copy that.
also granted your background i expected nothing less :-)