I want to use CHASMplus for annotation of my VCF files by ANNOVAR tool and I should reformat it to standard ANNOVAR genericDB format (Chr, Start, End, Ref, Alt, and other information), but I don't know how can I download this database. Can I do it at all?
It is quite easy to annotate variants with CHASMplus scores by using OpenCRAVAT (https://opencravat.org/ ), either by the webserver or command line tool. I assume, though, that you likely have an existing pipeline using ANNOVAR and would like to annotate with it for consistency.
You can alway access the data underlying an annotator in OpenCRAVAT, including CHASMplus. Basically what you need to do is to install OpenCRAVAT, download the CHASMplus annotator, and then dump the CHASMplus sqlite database file to a CSV file. You could then reformat the data to what ever is needed. Commands would look like the following:
# install OpenCRAVAT with CHASMplus annotator
pip install open-cravat
oc module install-base
oc module install -y chasmplus
# change directory to installed chasmplus data
cd `oc config md`/annotators/chasmplus/data/
# dump all sqlite tables to csv files
for t in chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chrX chrY transcript; do
sqlite3 -header -csv chasmplus.sqlite "SELECT * FROM $t;" > `echo $t`.csv
done
One of the nearby directories contains the FASTA file of the reference genome (hg38 version in 2bit format). You should be able to extract the reference sequence from that.
Unfortunately, OpenCRAVAT only natively uses hg38. As such, the CHASMplus annotations are in hg38. OpenCRAVAT normally performs a liftover "on the fly" from variants entered in hg19 coordinates to hg38 coordinates, so no data for annotators use hg19.
The easiest route therefore might be to just run ANNOVAR and then run OpenCRAVAT (specifically designating the genome version as hg19), and then manually write a script to merge the two output text files.
Dear Collin,
As you kindly explained in your reply, I create the .csv files, but they don't have the reference allele and just have the alternative allele:
pos,alt,score,tid
88385824,G,0.004,34778
88385830,G,0.004,34778
88391421,G,0.01,34778
88391427,G,0.006,34778
88391451,G,0.009,34778
Would you mind helping me with how to add the reference allele to files?
One of the nearby directories contains the FASTA file of the reference genome (hg38 version in 2bit format). You should be able to extract the reference sequence from that.
Thank you collin, but I need the hg19 and couldn't find it. I would greatly appreciate it if you could help me again.
Unfortunately, OpenCRAVAT only natively uses hg38. As such, the CHASMplus annotations are in hg38. OpenCRAVAT normally performs a liftover "on the fly" from variants entered in hg19 coordinates to hg38 coordinates, so no data for annotators use hg19.
The easiest route therefore might be to just run ANNOVAR and then run OpenCRAVAT (specifically designating the genome version as hg19), and then manually write a script to merge the two output text files.