VEP/ CADD error - ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.
0
0
Entering edit mode
2.3 years ago
K.patel5 ▴ 150

Dear Biostars,

I am having a confusing issue with my CADD plugin. This is confusing because when I run VEP for my whole trio - all the plugins work fine. However when I try to run CADD for individual - pivoted files - it no longer does and I get the following error - ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.

The following code works:

modules/ variables load

module load Perl/5.34.0-GCCcore-11.2.0
module load tabix/0.2.6-GCCcore-10.2.0
module load Bio-DB-HTS/3.01-GCC-11.2.0
module load DBD-mysql/4.050-GCC-11.2.0
module load OpenSSL/1.1.1d-GCCcore-8.3.0

dir=/path/to/vep/
dir_cache=/path/to/vep/
fasta="Homo_sapiens_assembly38.fasta"

export PERL5LIB=$PERL5LIB:/mnt/storage/nobackup/proj/rtmngs/Pipelines/Software/vep/ensembl-vep/Plugins
${dir}/vep --cache --dir $dir \
--dir_cache $dir_cache \
--offline \
--fasta $fasta \
--species homo_sapiens \
--input_file trio_cohort.vcf.gz   \
--output_file trio_VEP_annotated.vcf  \
--format vcf \
--force_overwrite  \
--vcf \
--no_check_variants_order \
--check_existing \
--freq_pop gnomAD \
--assembly GRCh38 \
--stats_file trio_vep_stat.html \
--warning_file trio_vep_warning.txt \
--hgvs \
--variant_class \
--keep_csq \
--af_gnomad \
--polyphen p \
--sift p \
--symbol \
--total_length \
--max_af \
--plugin LoFtool \
--plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
--plugin Mastermind, /pluginpath/mastermind/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
--plugin DisGeNET, file=/pluginpath/disgenet/all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
--plugin LoFtool \
--plugin CADD, /pluginpath/whole_genome_SNVs.tsv.gz,/pluginpath/gnomad.genomes.r3.0.indel.tsv.gz \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \
--pick \
--pick_order rank,canonical,tsl \
--buffer_size 20000 \
--fork 4

The following code does not work when I try to use VEP on the proband, mother and father .vcf files individually:

for sample in $(ls -1 *_unique.vcf) 
do
    ${dir}/vep --cache --dir $dir \
    --dir_cache $dir_cache \
    --offline \
    --no_stats \
    --fasta $fasta \
    --species homo_sapiens \
    --input_file ${sample} \
    --output_file ${sample}_coding.vcf  \
    --format vcf \
    --vcf \
    --no_check_variants_order \
    --hgvs \
    --variant_class \
    --keep_csq \
    --af_gnomad \
    --polyphen p \
    --sift p \
    --symbol \
    --total_length \
    --max_af \
    --check_existing \
    --freq_pop gnomAD \
    --assembly GRCh38 \
    --plugin LoFtool \
    --plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
    --plugin Mastermind,/pluginpath/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
    --plugin LoFtool \
    --plugin DisGeNET, file=//pluginpath//all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
    --plugin CADD, /pluginpath//whole_genome_SNVs.tsv.gz,/pluginpath//gnomad.genomes.r3.0.indel.tsv.gz \
    --fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \    
    --pick_order rank,canonical,tsl \
    --buffer_size 20000 \
    --fork 4 
done

Any ideas on why CADD works fine for the first VEP when using a trio, but not on the individual files?

Cheers, Krutik

CADD trio WES VEP • 1.2k views
ADD COMMENT
0
Entering edit mode

Hi, did you manage to solve this problame? I have this error: WARNING: Failed to instantiate plugin CADD: ERROR: Data file CADD_PHRED not found

ADD REPLY

Login before adding your answer.

Traffic: 1978 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6