Hi all,
I am using VEP docker and I have downloaded Ensembl homo_sapiens cache. Both cache and docker are VEP 110 (dcoker version 110.1 docker link)
To annotate refseq, I download their GFF and BAM file from this link and use this parameter:
--gff [path]/GCF_000001405.40_GRCh38.p14_genomic.sorted.gff.gz
--bam [path]/GCF_000001405.40_GRCh38.p14_knownrefseq_alns.bam
--fasta [path]/Homo_sapiens.GRCh38.dna.toplevel.fa.gz
I also supply the synonym file from the cache folder. I can see form the output VCF file, the GFF file is being read form the header in output VCF file:
##INFO=<ID=CSQ,Number=.,Type=String,Description="Consequence annotations from Ensembl VEP. Format: Allele|Consequence|IMPACT|SYMBOL|Gene|Feature_type|Feature|BIOTYPE|EXON|INTRON|HGVSc|HGVSp|cDNA_position|CDS_position|Protein_position|Amino_acids|Codons|Existing_variation|DISTANCE|STRAND|FLAGS|SYMBOL_SOURCE|HGNC_ID|CANONICAL|CCDS|ENSP|GIVEN_REF|USED_REF|BAM_EDIT|SOURCE|SIFT|PolyPhen|DOMAINS|HGVS_OFFSET|HGVSg|MOTIF_NAME|MOTIF_POS|HIGH_INF_POS|MOTIF_SCORE_CHANGE|TRANSCRIPTION_FACTORS|GCF_000001405.40_GRCh38.p14_genomic.sorted.gff.gz">
But I notice there are no annotation that use Refseq, such as NM_*. When I use docker ensembl version 109.3, it works fine and show Refseq annotation. I have tried to use docker version 111.0 but it is same as 110.1 that I don't have the Refseq annotation. What happened here? Is there any changes between 109.3 and 110.1 so that the GFF from Refseq is not usable?
Notes: It is a bug from VEP, has been confirmed in this issue: link
You need to download the refseq cache and use the
--refseq
option when running VEP.https://www.ensembl.org/info/docs/tools/vep/script/vep_options.html#cache_options