snpEff database build error
0
0
Entering edit mode
2.0 years ago

Hi everyone, I was trying to build the snpEff database for the Human herpesvirus 5 strain Merlin (https://www.ncbi.nlm.nih.gov/nuccore/AY446894.2) using the script provided by SnpEff (buildDbNcbi.sh), and I got the following error described in the Error message section. I think the gen-bank file itself probably causes it. A formatting error or something in the gbk file. Is there anyone who encountered a similar problem? How did you overcome it? What do you suggest?

Note: Later, I tried to build the database manually and got the same error. I updated SnpEff to the 5.1 version and tried again. But I got the same error.

I really appreciate any help you can provide.

To Reproduce

SnpEff version: 5.0

Genome version: AY446894.2

SnpEff full command line: bash ~/path-to-script/buildDbNcbi.sh AY446894.2

Output / Error message: java.lang.RuntimeException: Error reading file '/path-to-data/data/AY446894.2/genes.gbk' java.lang.RuntimeException: Transcript 'HHV5wtgr002' is already in Gene 'HHV5wtgr002'

Expected behavior: Building database

Annotation Database GenBank SnpEff • 1.0k views
ADD COMMENT
1
Entering edit mode

It seems the annotation contains two genes (probably identical?) at different positions (6759..8458 and 8250..8393), but with same name (RL9A) and locus_tag (HHV5wtgr002). My guess is snpEff wants unique names for the genes and transcripts.

ADD REPLY
0
Entering edit mode

Thank you for your input. I believe you guessed it correctly. I have deleted redundant entries in the GenBank file. I am not sure that was the right approach, but that worked. Also, I was not interested in those regions anyways.

ADD REPLY

Login before adding your answer.

Traffic: 2242 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6