Entering edit mode
3.4 years ago
saadatagah.m
•
0
Hi
I am trying to use VEP to annotate a vcf file and also add ClinVar data to it:
vep --dir_cache $VEP_CACHEDIR
-i path/to/input/input.vcf --format vcf
-o path/to/output/output.vcf --vcf
--cache path/to/cachedir/
--offline --assembly GRCh37
--custom path/to/clinvar/clinvar.vcf.gz,ClinVar,vcf,exact,0,CLNSIG,CLNREVSTAT
--everything --force_overwrite --fork 16
Although CLNSIG and CLNREVSTAT columns are being added to the vcf file. But all cells are blank (more than 90% of my variants have been previously reported in the ClinVar)
Thank you in advance for your help.
Is that your exact command as you used it? Or did you spell the word
assembly
correctly in your real command? If the incorrect spelling was used, the VEP may be annotating against GRCh38.Thanks, @Emily_Ensembel for the comment. I have spelled it correctly in the real command. I will edit my post to make it correct as well.
Can you give an example of a line where you expect to get ClinVar annotation, please? And can you specify exactly which file you downloaded from ClinVar?
It is a population-level vcf file that contains about ~40000 variants. I downloaded https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz and also https://ftp.ncbi.nlm.nih.gov/pub/clinvar/vcf_GRCh37/clinvar.vcf.gz.tbi
And a line in your input and output, please?
I realized that the ClinVar data actually are added to the CSQ column, however the "vcfR2tidy" function (I need to load into R for downstream analysis) can't retrieve these data correctly. It will add the ClinVar and ClinVar_CLNSIG columns but no information will be popped up in cells. However, Clinvar annotation data could be found in the CSQ cell.
I have also two more questions and would appreciate your guidance.
I tried to download all FASTA files (http://m.ensembl.org/info/data/ftp/index.html) to be able to do --everything offline. However, all files are based on GRCh38 assembly like this: http://ftp.ensembl.org/pub/release-104/fasta/homo_sapiens/pep/ Could you please tell me how I can download GRCh37 FASTA files?
I need to use some plugins in offline mode, specially REVEL, and MaxEntScan. However, I could not find an example script and details on how to add these two to my script.
You should get the option to install the FASTA file when you run
install.pl
.Most of the pathogenicity scores are part of the
dbNSFP
plugin.If I am not mistaken, REVEL and MaxEntScan are not part of dbNSFP and should be run separately.
You're right, you'll find the separate REVEL and MaxEntScan plugins on the plugins page.