Dear Biostars,
I am having a confusing issue with my CADD
plugin. This is confusing because when I run VEP
for my whole trio - all the plugins work fine. However when I try to run CADD
for individual - pivoted files - it no longer does and I get the following error - ERROR: Assembly is GRCh38 but CADD file does not contain GRCh38 in header.
The following code works:
modules/ variables load
module load Perl/5.34.0-GCCcore-11.2.0
module load tabix/0.2.6-GCCcore-10.2.0
module load Bio-DB-HTS/3.01-GCC-11.2.0
module load DBD-mysql/4.050-GCC-11.2.0
module load OpenSSL/1.1.1d-GCCcore-8.3.0
dir=/path/to/vep/
dir_cache=/path/to/vep/
fasta="Homo_sapiens_assembly38.fasta"
export PERL5LIB=$PERL5LIB:/mnt/storage/nobackup/proj/rtmngs/Pipelines/Software/vep/ensembl-vep/Plugins
${dir}/vep --cache --dir $dir \
--dir_cache $dir_cache \
--offline \
--fasta $fasta \
--species homo_sapiens \
--input_file trio_cohort.vcf.gz \
--output_file trio_VEP_annotated.vcf \
--format vcf \
--force_overwrite \
--vcf \
--no_check_variants_order \
--check_existing \
--freq_pop gnomAD \
--assembly GRCh38 \
--stats_file trio_vep_stat.html \
--warning_file trio_vep_warning.txt \
--hgvs \
--variant_class \
--keep_csq \
--af_gnomad \
--polyphen p \
--sift p \
--symbol \
--total_length \
--max_af \
--plugin LoFtool \
--plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
--plugin Mastermind, /pluginpath/mastermind/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
--plugin DisGeNET, file=/pluginpath/disgenet/all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
--plugin LoFtool \
--plugin CADD, /pluginpath/whole_genome_SNVs.tsv.gz,/pluginpath/gnomad.genomes.r3.0.indel.tsv.gz \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \
--pick \
--pick_order rank,canonical,tsl \
--buffer_size 20000 \
--fork 4
The following code does not work when I try to use VEP on the proband, mother and father .vcf files individually:
for sample in $(ls -1 *_unique.vcf)
do
${dir}/vep --cache --dir $dir \
--dir_cache $dir_cache \
--offline \
--no_stats \
--fasta $fasta \
--species homo_sapiens \
--input_file ${sample} \
--output_file ${sample}_coding.vcf \
--format vcf \
--vcf \
--no_check_variants_order \
--hgvs \
--variant_class \
--keep_csq \
--af_gnomad \
--polyphen p \
--sift p \
--symbol \
--total_length \
--max_af \
--check_existing \
--freq_pop gnomAD \
--assembly GRCh38 \
--plugin LoFtool \
--plugin REVEL,/pluginpath/new_tabbed_revel_grch38.tsv.gz \
--plugin Mastermind,/pluginpath/mastermind_cited_variants_reference-2022.07.22-grch38.vcf.gz,0,0,1 \
--plugin LoFtool \
--plugin DisGeNET, file=//pluginpath//all_variant_disease_pmid_associations_final.tsv.gz,disease=1 \
--plugin CADD, /pluginpath//whole_genome_SNVs.tsv.gz,/pluginpath//gnomad.genomes.r3.0.indel.tsv.gz \
--fields "Uploaded_variation,Location,Allele,Gene,Feature,SYMBOL,Existing_variation,VARIANT_CLASS,Consequence,cDNA_position,CDS_position,Protein_position,Amino_acids,HGVSc,HGVSp,BIOTYPE,IMPACT,CLIN_SIG,PolyPhen,SIFT,MAX_AF,gnomAD_AF,AF,CADD_PHRED,CADD_RAW,LoFtool,REVEL,Mastermind_URL,DisGeNET_PMID,DisGeNET_SCORE,DisGeNET_disease" \
--pick_order rank,canonical,tsl \
--buffer_size 20000 \
--fork 4
done
Any ideas on why CADD
works fine for the first VEP
when using a trio, but not on the individual files?
Cheers, Krutik
Hi, did you manage to solve this problame? I have this error:
WARNING: Failed to instantiate plugin CADD: ERROR: Data file CADD_PHRED not found