Hi all, Im trying to make my VEP work on human data.
My command looks like this:
vep --input_file "$INPUT_VCF" \
--output_file "$OUTPUT_VEP" \
--vcf \
--no_intergenic \
--cache \
--merged \
--everything \
--assembly GRCh38 \
--force_overwrite \
--fasta "$REFERENCE_GENOME" \
--nearest symbol \
--symbol \
--individual all \
--flag_pick --pick --pick_allele \
--pick_order rank,canonical,appris,ccds,biotype \
--plugin AlphaMissense,file="$PLUGINS_DIR/AlphaMissense_hg38.tsv.gz" \
--plugin Blosum62 \
--plugin DosageSensitivity,file="$PLUGINS_DIR/Collins_rCNV_2022.dosage_sensitivity_scores.tsv.gz" \
--plugin mutfunc,db="$PLUGINS_DIR/mutfunc_data.db" \
--plugin IntAct,mutation_file="$PLUGINS_DIR/mutations.tsv",mapping_file="$PLUGINS_DIR/mutation_gc_map.txt.gz",feature_ac=1 \
--plugin Enformer,file="$PLUGINS_DIR/enformer_grch38.vcf.gz" \
--plugin Phenotypes \
--plugin PolyPhen_SIFT,create_db=0 \
--verbose
And then the output of it goes straight to SnpEff:
java -Xmx8G -jar "$SNPEFF_JAR" "$SNPEFF_DB" "$OUTPUT_VEP" > "$OUTPUT_SNPEFF"
where SNPEFF_DB="GRCh38.99"
I'm using up-to-date database (merged, where Ensembl is supposed to be v113, because for some reason Ensembl or RefSeq would give me super weird annotation, or even empty), checked different parameters, ran with and without --no_intergenic, but can't seem to understand why it wouldn't give me symbols for every variant (be it genic, intergenic, intronic, downstream etc). More than that, for some reason SnpEff annotation would mismatch one another, sometimes giving symbols for variants that VEP doesn't have a symbol of, or vice versa, or a contradictary one.
Another problem I'm not understanding is why some of the data in the columns looks like it is from different columns. I am indeed using a python script converting vcf to xlsx, but it doesn't seem to be the reason (the only thing im doing wrong is probably parsing names for SnpEff annotations, since I do see a shift by one).
Overall, my main question is what am I doing wrong in terms of symbols, and which source to trust. If somebody probably has a working vep command or something, I would appreciate that a lot too.
Go with either, just mention which one you use and what version of it you're using. Personally, I go with VEP because it's been historically more accurate. snpEff has caught up but I trust VEP more.