snpeff : wont recognize the gtf or gff3 files (runtime exception)
1
0
Entering edit mode
3.3 years ago
VenGeno ▴ 100

Hi,

I am trying to build a custom databasee for snpeff. As instructed both in the forum and snpeff instructions, I did the following;

Then I added the following into snpEff.config file

# BG94_1
BG94_1.genome : BG94_1

Then I added a gff3 file (tried with gtf too) in to the path/to/snpeff-5.0.1/data/BG94_1 folder together with BG94_1.fa (both gzipped) Then I ran the following command (please note that I am using bioconda installation of snpeff).

snpEff build -gff3 -v BG94_1

I am getting the following error; 0

0:00:00 SnpEff version SnpEff 5.0e (build 2021-03-09 06:01), by Pablo Cingolani
00:00:00    Command: 'build'
00:00:00    Building database for 'BG94_1'
00:00:00    Reading configuration file 'snpEff.config'. Genome: 'BG94_1'
00:00:00    Reading config file: /Users/venura/miniconda3/pkgs/snpeff-5.0-hdfd78af_1/share/snpeff-5.0-1/data/BG94_1/snpEff.config
00:00:00    Reading config file: /Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/snpEff.config
00:00:00    done
Reading GFF3 data file  : '/Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff'
java.lang.RuntimeException: File not found '/Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff'
    at org.snpeff.util.Gpr.reader(Gpr.java:536)
    at org.snpeff.util.Gpr.reader(Gpr.java:507)
    at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.readGff(SnpEffPredictorFactoryGff.java:488)
    at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:341)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:370)
    at org.snpeff.SnpEff.run(SnpEff.java:1188)
    at org.snpeff.SnpEff.main(SnpEff.java:168)
java.lang.RuntimeException: Error reading file '/Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff'
java.lang.RuntimeException: File not found '/Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff'
    at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:357)
    at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:370)
    at org.snpeff.SnpEff.run(SnpEff.java:1188)
    at org.snpeff.SnpEff.main(SnpEff.java:168)
00:00:00    Logging
00:00:01    Checking for updates...
00:00:03    Done.

Here are some lines from my gff3 file;

#gff-version 3
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    1   1345    .   +   .   ID=gene_1;Name=Os01g0293800 gene;coverage=0.997;sequence_ID=0.982;extra_copy_number=0;copy_num_ID=gene_1_0
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    1623    3128    .   -   .   ID=gene_6;Name=Os01g0293900 gene;coverage=0.999;sequence_ID=0.968;extra_copy_number=0;copy_num_ID=gene_6_0
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    20379   21605   .   -   .   ID=gene_7;Name=Os01g0294500 gene;coverage=0.999;sequence_ID=0.995;extra_copy_number=0;copy_num_ID=gene_7_0
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    48673   50214   .   -   .   ID=gene_5;Name=Os01g0294700 gene;coverage=1.0;sequence_ID=0.995;extra_copy_number=0;copy_num_ID=gene_5_0
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    102125  104501  .   -   .   ID=gene_4;Name=Os01g0295600 gene;coverage=1.0;sequence_ID=0.992;extra_copy_number=0;copy_num_ID=gene_4_0
    Bg_94-1_CX35|chr01_10700000_16500000    Liftoff gene    105502  108051  .   -   .   ID=gene_3;Name=Os01g0295700 gene;coverage=0.996;sequence_ID=0.991;extra_copy_number=0;copy_num_ID=gene_3_0

I am wondering why this is happening.

GTF GFF3 snpeff • 3.7k views
ADD COMMENT
0
Entering edit mode

mmh, maybe you should consider giving Juke34's first idea a chance ;-)

ADD REPLY
0
Entering edit mode

Bullshit, sorry. This might be your problem:

java.lang.RuntimeException: File not found

ADD REPLY
0
Entering edit mode

Unfortunately, both files are there :|

ADD REPLY
0
Entering edit mode

I tried with the original gff3 file as well. So I assumed that it has nothing to do with formatting 🤔. So you think it has something to do with formatting? 😢

ADD REPLY
0
Entering edit mode

What are the rights of your file? ls -l /Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff

ADD REPLY
0
Entering edit mode
-rw-r--r--@ 1 venura  staff  1657163 Aug 12 11:19 BG94_1.fa.gz
-rw-rw-r--@ 1 venura  staff     1831 Aug 30 15:43 features_level1.json
-rw-r--r--  1 venura  staff    10092 Aug 12 11:21 genes.gff.gz
-rw-r--r--@ 1 venura  staff    11896 Aug 30 15:59 genes.gtf.gz

PS: Just to give a little bit more insight, I used Liftoff to annotate BG94_1 from Niponbare genome assembly (here I am only looking at a region of chromosome 1). Unfortunately, liftoff didn't pick anything other than genes from the source gff3 file. That's how I ended up with only gens there.

ADD REPLY
0
Entering edit mode

?? /Users/venura/miniconda3/envs/py38/share/snpeff-5.0-1/./data/BG94_1/genes.gff is a folder?

ADD REPLY
0
Entering edit mode

No No. Apologies. Here are the properties of the gff file. (when I copied I missed one line)

-rw-r--r--  1 venura  staff    82035 Aug 30 17:45 genes.gff
ADD REPLY
0
Entering edit mode

I have the exact same issue with introduction of '.' between directories, will it be solved with changing admin privileges?

ADD REPLY
0
Entering edit mode

If you are sure that you have the right annotation files in the right directory. Then yes, the issue is the "." between directories. just edit the following part of snpEff.config:

#---
# Databases are stored here
# E.g.: Information for 'hg19' is stored in data.dir/hg19/
#
# You can use tilde ('~') as first character to refer to your home directory. 
# Also, a non-absolute path will be relative to config's file dir
# 
#---
data.dir = ./data/

Change ./data/ to your data directory.

ADD REPLY
4
Entering edit mode
3.2 years ago
VenGeno ▴ 100

Finally, I made the custom database. Adding steps here just in case someone else needs it.

First, I added my database entries into the snpEff.config file.

# BG94_1
BG94_1.genome : BG94_1

Since my genes.gff3 file continued to give troubles, I used @Juke34's AGAT gff2gtf script and converted the file to .gtf (version matters) using the following command.

agat_convert_sp_gff2gtf.pl -gff genes.gff3  --gtf_version 2.2 -o genes.gtf

Then I included the annotation file (genes.gtf) together with sequence sequences.fa file inside the predefined folder for the database (my case its /Users/venura/miniconda3/pkgs/snpeff-5.0-hdfd78af_1/share/snpeff-5.0-1/data/BG94_1). Then I used the following command (inside the snpeff-5.0-1 folder to build the database)

 java -jar snpEff.jar build -gtf22 -v BG94_1
ADD COMMENT
0
Entering edit mode

Hello, VanGeno sequence.fa is your reference genome? or which file is it?

ADD REPLY

Login before adding your answer.

Traffic: 1906 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6