Question

Annovar doesnt output CADD scores

0

Entering edit mode

19 months ago

AMARU • 0

Hi,

I followed the Annovar tutorial with the default dataset (avsnp147, ExAC and dbnsfp30a). The tutorial can be found here: https://annovar.openbioinformatics.org/en/latest/user-guide/startup/

The resulting vcf contained all the expected format and data, including CADD scores. Then, I decided to repeat this using gnomad211_exome,avsnp150, and dbnsfp42c datasets instead of those above, but the resulting vcf file contains all the annotations expected except the CADD scores. These datasets were downloaded using the Annovar guidelines.

The header of the vcf doesn't even include the following:

##INFO=<ID=CADD_raw,Number=.,Type=Float,Description="CADD_raw annotation provided by ANNOVAR">
##INFO=<ID=CADD_phred,Number=.,Type=Float,Description="CADD_phred annotation provided by ANNOVAR">

Can someone tell me why is this happening? Do any of the datasets used in the second case not include CADD scores?

Below is the command I used:

perl ./annovar/table_annovar.pl \
  in.vcf \
  humandb/ \
  -buildver hg19 \
  -out myanno.Equal \
  -remove \
  -protocol refGene,cytoBand,gnomad211_exome,avsnp150,dbnsfp42c \
  -operation g,r,f,f,f \
  -nastring . \
  -vcfinput \
  -polish

Thanks in advance.

Annovar CADD • 1.4k views

ADD COMMENT • link updated 19 months ago by Ram 45k • written 19 months ago by AMARU • 0

score 2 · Accepted Answer · 2023-08-31

Simple answer: You switched from dbNSFP academic version to commercial version, and the commercial version does not include CADD.

How to get to this answer:

You are missing CADD. CADD comes from dbNSFP. You used dbNSFP30a and it worked. Then you used dbNSFP42c and it did not. From previous experience, I know that the a and c suffixes are significant somehow - this is the only place where experience helps, but even if I did not know this, I'd look for differences between 30a and 42c and probably end up here a few minutes later than I did: http://database.liulab.science/dbNSFP#version

Two branches of dbNSFP are provided: dbNSFP4.4a suitable for academic use, which includes all the resources, and dbNSFP4.4c suitable for commercial use, which does not include Polyphen2, VEST, REVEL, ClinPred, CADD, LINSIGHT, and GenoCanyon.

All this is just to say that you did everything right - there is but one leap you needed to take to get to the solution yourself. Keep up this approach (of taking things that work and introducing small changes that might break them, then figure out how those small changes broke them) and you'll learn things super fast.