Hi,
I generated my vcf files from GATK pipeline using ploidy 1 as it is a mycobacterium tuberculosis genome. Now i want to annotate my variants using snpEFF and Annovar. I search snpEff database for mtb annotation using:
java -jar snpEff.jar download -v Mycobacterium_tuberculosis
it gave me numerous results showing that it contans the mtb database. Bit I'm not sure which one is mine/reference one that i used to generate the vcf file. My mtb reference genome file looks like this:
>M.tuberculosis_H37Rv NC_000962.3
ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcggtcgtctccgaacttaacggcgaccctaaggttgacgacggacccagcagtgatgctaatctcagcgctccgctgacccctcagcaaagggcttggctcaatctcgtccagccattgaccatcgtcgaggggtttgctctgttatccgtgccgagcagctttg.............................
I tried buildDbNcbi.sh script from snpEFF to build my own db but it is produced the following error:
Downloading genome NC_000962
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.7M 0 17.7M 0 0 157k 0 --:--:-- 0:01:55 --:--:-- 483k
00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'NC_000962'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'NC_000962'
00:00:00 Reading config file: /home/sark/snpEff/snpEff.config
00:00:01 done
No sequence found in feature file.
Trying fasta file '/home/sark/snpEff/./data/genomes/NC_000962.fa'
Trying fasta file '/home/sark/snpEff/./data/NC_000962/sequences.fa'
java.lang.RuntimeException: Cannot find sequence for 'NC_000962'
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.sequence(SnpEffPredictorFactoryFeatures.java:467)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.addFeatures(SnpEffPredictorFactoryFeatures.java:111)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:330)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading file '/home/sark/snpEff/./data/NC_000962/genes.gbk'
java.lang.RuntimeException: Cannot find sequence for 'NC_000962'
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:344)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:01 Logging
00:00:02 Checking for updates...
00:00:04 Done.
Then i kept my fasta file in the above mentioned error folder but now it is giving the following error:
Downloading genome NC_000962.3
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 17.7M 0 17.7M 0 0 332k 0 --:--:-- 0:00:54 --:--:-- 447k
curl: (16) Error in the HTTP2 framing layer
Then i thought of using the built in db for MTB so i just renamed my chr names in my file it is: M.tuberculosis_H37Rv And i tried to replace it with the built in one: ERS007734SCcontig000001 Still no success.
It is generating the following error in each variant of the vcf file:
9;ANN=A||MODIFIER|||||||||||||ERROR_OUT_OF_CHROMOSOME_RANGE
Can anyone help me with this please and can anyone tell how to use annovar for same vcf file?
Thank you. :)