snpEff building new database ERROR
4
3
Entering edit mode
5.8 years ago

I'm current working on building a new database on snpEff, because the current one is not an up-to-date version. However I keep receiving the error below:

java -jar snpEff.jar build -gff3 -v Zea_Mays_B73
00:00:00    SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00    Command: 'build'
00:00:00    Building database for 'Zea_Mays_B73'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'Zea_Mays_B73'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/snpEff.config
java.lang.RuntimeException: Property: 'Zea_Mays_B73.genome' not found
    at org.snpeff.interval.Genome.<init>(Genome.java:106)
    at org.snpeff.snpEffect.Config.readGenomeConfig(Config.java:681)
    at org.snpeff.snpEffect.Config.readConfig(Config.java:649)
    at org.snpeff.snpEffect.Config.init(Config.java:480)
    at org.snpeff.snpEffect.Config.<init>(Config.java:117)
    at org.snpeff.SnpEff.loadConfig(SnpEff.java:451)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:364)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:00    Logging
00:00:01    Done.

I have followed the documentation on the snpEff manual page(http://snpeff.sourceforge.net/SnpEff_manual.html#databases) with no luck.

Any help is very very appreciated!

snpEff • 15k views
ADD COMMENT
1
Entering edit mode

Anastasia A. : Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized. SUBMIT ANSWER is reserved for submitting new answers for the original question in the thread.

ADD REPLY
0
Entering edit mode

Similar problem with this:

error in building annotation database by SnpEff

Have you added your genome entry to the snpEff config file?

ADD REPLY
0
Entering edit mode

Thank you for your reply, and yes I did add the genome to the snpEff config file and still received the same error.

Zea Mays B73 genome,Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73

I also made a dir called Zea_Mays_B73v4 and downloaded both genomic fasta and the gff3 annotation file (also renamed those according to the instructions on snpEff manual).

ADD REPLY
1
Entering edit mode

Is the command you used?

java -jar snpEff.jar build -gtf22 -v Zea_Mays_B73

Make sure the name should be Zea_Mays_B73v4 not Zea_Mays_B73

ADD REPLY
0
Entering edit mode

Yes, you were right, that worked!

Thank you!! (hate when it is just a typo)

ADD REPLY
0
Entering edit mode

Now that I was able to get the database to built I'm receiving the following when trying to download it.

java -jar snpEff.jar download -v Zea_Mays_B73v4
00:00:00    SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani
00:00:00    Command: 'download'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'Zea_Mays_B73v4'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/snpEff.config
00:00:00    done
00:00:00    Downloading database for 'Zea_Mays_B73v4'
00:00:01    Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
00:00:01    ERROR while connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
java.lang.RuntimeException: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:178)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:127)
    ... 5 more
00:00:01    Logging
00:00:02    Done.

Any thoughts on why it is still unable to find my database?

ADD REPLY
0
Entering edit mode

Hello and welcome to biostars Anastasia A. ,

Please use the formatting bar (especially the code option) to present your post better. I've done it for you this time.
code_formatting

Thank you!

ADD REPLY
0
Entering edit mode

Thank you! I did that and it seemed like the db was built. However when I run my samples using my db as the reference db, the program is unable to find it and it is looking in the snpeff server. I get the error message below:

java `-Xmx4g -jar snpEff.jar -v -stats 2mutantNewDB.html Zea_Mays_B73v4 2mutant.vcf > 2mutantNewDB.ann.vcf &`

SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani

00:00:00    Command: '`ann`'
00:00:00    Reading configuration file '`snpEff.config'. Genome: 'Zea_Mays_B73v4`'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff.config
00:00:00    done
00:00:00    Reading database for genome version '`Zea_Mays_B73v4`' from file '/opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/./data/Zea_Mays_B73v4/snpEffectPredictor.bin' (this might take a while)
00:00:00    Database not installed
    Attempting to download and install database '`Zea_Mays_B73v4`'
00:00:00    Reading configuration file 'snpEff.config'. Genome: '`Zea_Mays_B73v4`'
00:00:00    Reading config file: /opt/home/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff.config
00:00:01    done
00:00:01    Downloading database for '`Zea_Mays_B73v4`'
00:00:01    Connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
00:00:01    ERROR while connecting to http://downloads.sourceforge.net/project/snpeff/databases/v4_3/snpEff_v4_3_Zea_Mays_B73v4.zip
java.lang.RuntimeException: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:178)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.downloadAndInstall(SnpEffCmdDownload.java:32)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.runDownloadGenome(SnpEffCmdDownload.java:86)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdDownload.run(SnpEffCmdDownload.java:72)
    at org.snpeff.SnpEff.run(SnpEff.java:1221)
    at org.snpeff.SnpEff.loadDb(SnpEff.java:515)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:1001)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdEff.run(SnpEffCmdEff.java:984)
    at org.snpeff.SnpEff.run(SnpEff.java:1183)
    at org.snpeff.SnpEff.main(SnpEff.java:162)
Caused by: java.lang.RuntimeException: File not found on the server. Make sure the database name is correct.
    at org.snpeff.util.Download.download(Download.java:127)
    ... 9 more
java.lang.RuntimeException: Genome download failed!
ADD REPLY
0
Entering edit mode

Looks like snpEff still tries to download the database instead of looking for a local one. Can you confirm you added the entry to the configure file like this:

# Zea Mays B73 genome, Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Make sure the first line is commented out. Then try this:

java -Xmx4g -jar /path/to/snpEff/snpEff.jar -c /path/to/snpEff/snpEff.config -v Zea_Mays_B73v4 input.vcf > output.ann.vcf
ADD REPLY
0
Entering edit mode

I edit the snpEff as you described and still didn't work. I also added data.dir pathway as it is specified on the config file :

# Zea_Mays_B73v4 genome, Version 4 Zea_Mays_B73v4.genome : Zea_Mays_B73v4 data.dir = ~/ngsclass/Amoiroglou/pt1RNAlibs/VcfFiles/snpEff/data/Zea_Mays_B73v4/

that still didn't seem to solve the issue of trying to find the db on the server instead of the local directory. I tried both with and without the data.dir and had no luck.

ADD REPLY
1
Entering edit mode

I'll download the maize data to do a real test and report back later.

ADD REPLY
3
Entering edit mode
5.8 years ago
Vitis ★ 2.6k

OK. I ran through the entire process and it seems to work fine for me. Here are the steps:

Download reference genome and annotation from ensembl ftp site. I downloaded the annotation file in GFF3 format.

https://plants.ensembl.org/Zea_mays/Info/Index

The reference fasta file was renamed and compressed it as "sequences.fa.gz", the annotations in GFF3 format was renamed and compressed as "genes.gff.gz". It seems you'll have to use "gff" in the name instead of "gff3" to have snpEff recognize it.

gzip sequences.fa
gzip genes.gff

Create a directory for your database under /path/to/snpEff/data/

cd /path/to/snpEff/data/
mkdir Zea_Mays_B73v4
cd Zea_Mays_B73v4

Move the reference and annotation here:

mv somewhere/sequences.fa.gz ./
mv somewhere/genes.gff.gz ./

Edit the snpeff.config file, add the following lines for your database (I added under the ensembl release 86 section)

# Zea Mays B73 genome, Version 4
Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Then run the database building step. Make sure you use the "-gff3" option to match your "genes.gff.gz" file.

java -jar snpEff.jar build -gff3 -v Zea_mays_B73v4

You should see some warnings about UTRs but there should not be any "ERROR" reported.

Then you can run the effect prediction.

java -Xmx4g -jar /path/to/snpEff/snpEff.jar -c /path/to/snpEff/snpEff.config -v Zea_mays_B73v4 input.vcf > output.ann.vcf

This should pick up your custom database Zea_mays_B73v4 correctly.

A side note, if you're running relatively small VCF for this, you may consider using ensembl's online VEP interface. It works well for a limited number of variants, say, a few hundred.

https://uswest.ensembl.org/info/docs/tools/vep/index.html

ADD COMMENT
0
Entering edit mode

Thanks for this! However, I tried it and I get errors saying that I am missing the protein and cds files. Why this didn't happen to you?

ADD REPLY
0
Entering edit mode

Hi, I am facing the same error. Were you able to solve it without the protein and cds files ? It ends by saying Database check failed

ADD REPLY
1
Entering edit mode

I just found out that we can use -noCheckCds option inorder to skip it. Just mentioning it here in case someone needs it.

ADD REPLY
2
Entering edit mode
5.8 years ago
Vitis ★ 2.6k

I think you're supposed to build the database from a local fasta reference and gff3 annotation file, instead of downloading, if the version of annotation is not available through snpEff.

cd /path/to/snpEff/data/
mkdir Zea_Mays_B73v4
cd Zea_Mays_B73v4

Move the reference sequences and annotation to the data directory. Make sure they're named "sequences" and "genes".

mv somewhere/sequences.fa.gz ./
mv somewhere/genes.gtf.gz ./

Then run the building step.

cd /path/to/snpEff
java -jar snpEff.jar build -gtf22 -v Zea_Mays_B73v4

The -gft22 option specifies the format you're using for annotation files.

ADD COMMENT
1
Entering edit mode
5.2 years ago

Download the latest version of snpEff from the following link and unzip it.

http://sourceforge.net/projects/snpeff/files/snpEff_latest_core.zip

Download reference genome in .fa format and annotation file in .gff3 format from the following link:

https://plants.ensembl.org/Zea_mays/Info/Index

Unzip both files and rename them accordingly: Reference genome sequences.fa Annotation file genes.gff

Remember to change the annotation file from .gff3 to .gff, otherwise snpEff wouldn’t be able to recognize it.

Make a new folder in inside your “snpEff” folder named “data” Inside the "data" folder, make two more folders - "genomes" and "Zea_Mays_B73v4" Transfer "sequences.fa" and "genes.gff" to the folder "genomes" and "Zea_Mays_B73v4", respectively.

*Important note: Java heap size is an important issue. From your "Control panel", go to the "Programs" and find "Java". Check whether you are using 32 bit or 64 bit. If you are using 32 bit, Java heap size will be maximum 1GB. Remove all previous/old version of Java from your computer. Install 64 bit Java and increase the maximum heap size. Otherwise, following codes will not work for you. You can watch YouTube video for assistance: "How to increase Java heap size"

Now, edit the snpEff.congif file. Add the following lines under ensemble release section.

Zea Mays B73 genome, Version 4

Zea_Mays_B73v4.genome: Zea_Mays_B73v4

Now, go to the directory which contains snpEff.jar file, and run the following command.

java -Xmx4G -jar snpEff.jar build -gff3 -v Zea_Mays_B73v4

Now, run the following code to execute your file.

$ java -Xmx4G -jar snpEff.jar Zea_Mays_B73v4 chr1_NoHapMap_EMS.vcf > output1.vcf

ADD COMMENT
0
Entering edit mode

Heys, as with my reference genome I want to update didn't work, I tried to repeat what you did with the Zea genome and it did not work. I obtained this error:

ERROR: CDS check file '/home/snpEff/./data/Zea_Mays_B73v4/cds.fa' not found.

As well as:

ERROR: Protein check file '/home/snpEff/./data/Zea_Mays_B73v4/protein.fa' not found.
ERROR: Database check failed.

Do we need the protein and cds files? I also tried with them and it did not work...

ADD REPLY
0
Entering edit mode

Actually, I can run further analysis with these errors. It did finish the "build" job.

ADD REPLY
0
Entering edit mode
5.8 years ago

Prebuilt Zea mays database is already available, not sure about the version.

java -jar /Toolbox/snpEff/snpEff.jar databases |grep "Zea"
java -jar /Toolbox/snpEff/snpEff.jar download -v Zea_mays
ADD COMMENT
0
Entering edit mode

Thank you, it is the older version though and that is why I need to build my own db!

ADD REPLY

Login before adding your answer.

Traffic: 1869 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6