VEP won't show symbols for all variants; SnpEff will, but won't for others
1
1
Entering edit mode
2 days ago

Hi all, Im trying to make my VEP work on human data.

My command looks like this:

vep --input_file "$INPUT_VCF" \
    --output_file "$OUTPUT_VEP" \
    --vcf \
    --no_intergenic \
    --cache \
    --merged \
    --everything \
    --assembly GRCh38 \
    --force_overwrite \
    --fasta "$REFERENCE_GENOME" \
    --nearest symbol \
    --symbol \
    --individual all \
    --flag_pick --pick --pick_allele \
    --pick_order rank,canonical,appris,ccds,biotype \
    --plugin AlphaMissense,file="$PLUGINS_DIR/AlphaMissense_hg38.tsv.gz" \
    --plugin Blosum62 \
    --plugin DosageSensitivity,file="$PLUGINS_DIR/Collins_rCNV_2022.dosage_sensitivity_scores.tsv.gz" \
    --plugin mutfunc,db="$PLUGINS_DIR/mutfunc_data.db" \
    --plugin IntAct,mutation_file="$PLUGINS_DIR/mutations.tsv",mapping_file="$PLUGINS_DIR/mutation_gc_map.txt.gz",feature_ac=1 \
    --plugin Enformer,file="$PLUGINS_DIR/enformer_grch38.vcf.gz" \
    --plugin Phenotypes \
    --plugin PolyPhen_SIFT,create_db=0 \
    --verbose

And then the output of it goes straight to SnpEff:

java -Xmx8G -jar "$SNPEFF_JAR" "$SNPEFF_DB" "$OUTPUT_VEP" > "$OUTPUT_SNPEFF"

where SNPEFF_DB="GRCh38.99"

I'm using up-to-date database (merged, where Ensembl is supposed to be v113, because for some reason Ensembl or RefSeq would give me super weird annotation, or even empty), checked different parameters, ran with and without --no_intergenic, but can't seem to understand why it wouldn't give me symbols for every variant (be it genic, intergenic, intronic, downstream etc). More than that, for some reason SnpEff annotation would mismatch one another, sometimes giving symbols for variants that VEP doesn't have a symbol of, or vice versa, or a contradictary one.

Another problem I'm not understanding is why some of the data in the columns looks like it is from different columns. I am indeed using a python script converting vcf to xlsx, but it doesn't seem to be the reason (the only thing im doing wrong is probably parsing names for SnpEff annotations, since I do see a shift by one).

Overall, my main question is what am I doing wrong in terms of symbols, and which source to trust. If somebody probably has a working vep command or something, I would appreciate that a lot too.

The weird data from different columns

Symbols from 2 different SnpEff annotations, mismatching one another and the VEP's ones

vep snpeff • 177 views
ADD COMMENT
0
Entering edit mode

Go with either, just mention which one you use and what version of it you're using. Personally, I go with VEP because it's been historically more accurate. snpEff has caught up but I trust VEP more.

ADD REPLY
0
Entering edit mode
1 day ago

This is probably more to do with the gene models being used and certain rules for precedence that each annotator is using. Looks like one of these is using a bigger list of transcripts including some dubious ones:

enter image description here

As to why some gene don't have HUGO/HGNC symbols - a lot of lncRNAs lack those names.

enter image description here

ADD COMMENT

Login before adding your answer.

Traffic: 1896 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6