Question

Annotating Mycobacterium tuberculosis VCF file using snpEFF and aNNOVAR

0

Entering edit mode

6.9 years ago

S AR ▴ 80

Hi,

I generated my vcf files from GATK pipeline using ploidy 1 as it is a mycobacterium tuberculosis genome. Now i want to annotate my variants using snpEFF and Annovar. I search snpEff database for mtb annotation using:

java -jar snpEff.jar download -v Mycobacterium_tuberculosis

it gave me numerous results showing that it contans the mtb database. Bit I'm not sure which one is mine/reference one that i used to generate the vcf file. My mtb reference genome file looks like this:

>M.tuberculosis_H37Rv NC_000962.3
ttgaccgatgaccccggttcaggcttcaccacagtgtggaacgcggtcgtctccgaacttaacggcgaccctaaggttgacgacggacccagcagtgatgctaatctcagcgctccgctgacccctcagcaaagggcttggctcaatctcgtccagccattgaccatcgtcgaggggtttgctctgttatccgtgccgagcagctttg.............................

I tried buildDbNcbi.sh script from snpEFF to build my own db but it is produced the following error:

Downloading genome NC_000962
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.7M    0 17.7M    0     0   157k      0 --:--:--  0:01:55 --:--:--  483k
00:00:00        SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00        Command: 'build'
00:00:00        Building database for 'NC_000962'
00:00:00        Reading configuration file 'snpEff.config'. Genome: 'NC_000962'
00:00:00        Reading config file: /home/sark/snpEff/snpEff.config
00:00:01        done
No sequence found in feature file.
        Trying fasta file '/home/sark/snpEff/./data/genomes/NC_000962.fa'
        Trying fasta file '/home/sark/snpEff/./data/NC_000962/sequences.fa'
java.lang.RuntimeException: Cannot find sequence for 'NC_000962'
        at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.sequence(SnpEffPredictorFactoryFeatures.java:467)
        at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.addFeatures(SnpEffPredictorFactoryFeatures.java:111)
        at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:330)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading file '/home/sark/snpEff/./data/NC_000962/genes.gbk'
java.lang.RuntimeException: Cannot find sequence for 'NC_000962'
        at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryFeatures.create(SnpEffPredictorFactoryFeatures.java:344)
        at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
        at org.snpeff.SnpEff.run(SnpEff.java:1183)
        at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:01        Logging
00:00:02        Checking for updates...
00:00:04        Done.

Then i kept my fasta file in the above mentioned error folder but now it is giving the following error:

Downloading genome NC_000962.3
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 17.7M    0 17.7M    0     0   332k      0 --:--:--  0:00:54 --:--:--  447k
curl: (16) Error in the HTTP2 framing layer

Then i thought of using the built in db for MTB so i just renamed my chr names in my file it is: M.tuberculosis_H37Rv And i tried to replace it with the built in one: ERS007734SCcontig000001 Still no success.

It is generating the following error in each variant of the vcf file:

9;ANN=A||MODIFIER|||||||||||||ERROR_OUT_OF_CHROMOSOME_RANGE

Can anyone help me with this please and can anyone tell how to use annovar for same vcf file?

Thank you. :)

SNP annotation snpEff Annovar MTB • 2.4k views

ADD COMMENT • link 6.9 years ago by S AR ▴ 80