snpEff build not working with gff3
0
0
Entering edit mode
5.1 years ago

Hello all!!!

I have sequenced the strain we have in the lab and now I want to use that assembly to make snp calling against other strains. The problem is while building the database with snpEff, I am always having this same error and I don't understand why :

java -jar snpEff.jar build -gff3 -v TB50

00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani 00:00:00 Command: 'build' 00:00:00 Building database for 'TB50' 00:00:00 Reading configuration file 'snpEff.config'. Genome: 'TB50' 00:00:00 Reading config file: /Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/snpEff.config 00:00:00 done Reading GFF3 data file : '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' Total: 0 markers added.

Create exons from CDS (if needed): 
Exons created for 0 transcripts.

Deleting redundant exons (if needed): 
    Total transcripts with deleted exons: 0

Collapsing zero length introns (if needed): 
    Total collapsed transcripts: 0
Reading sequences   :
FASTA file: '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/genomes/TB50.fa' not found.
Reading FASTA file: '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/sequences.fa'
    Reading sequence 'tb50_4_tig00000001_BK006938', length: 1527445
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_12_tig00000004_BK006945', length: 72496
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_15_tig00000013_BK006948', length: 1102713
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_16_tig00000014_BK006949', length: 941389
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_7_tig00000015_BK006941', length: 1091053
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_13_tig00000016_BK006946', length: 912520
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_2_tig00000018_BK006936', length: 832510
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_14_tig00000020_BK006947', length: 770263
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_10_tig00000022_BK006943', length: 744706
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_11_tig00000025_BK006944', length: 680926
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_8_tig00000027_BK006934', length: 576431
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_5_tig00000029_BK006939', length: 604804
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_9_tig00000033_BK006942', length: 425111
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_3_tig00000035_BK006937', length: 329289
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_6_tig00000037_BK006940', length: 266054
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_1_tig00000039_BK006935', length: 246943
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_Mito_tig00000041_KP263414', length: 84520
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_12_3_tig00000132_BK006945', length: 683993
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_12_2_tig00000133_BK006945', length: 88102
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
    Reading sequence 'tb50_12_1_tig00000134_BK006945', length: 482874
    Adding genomic sequences to exons:  Done (0 sequences added, 0 ignored).
Total: 0 sequences added, 0 sequences ignored.

Adjusting transcripts: 
Adjusting genes: 
Adjusting chromosomes lengths: 
Ranking exons: 
Create UTRs from CDS (if needed): 
Correcting exons based on frame information.

Remove empty chromosomes: 

Marking as 'coding' from CDS information: 
Done: 0 transcripts markedjava.lang.RuntimeException: FATAL ERROR: Most Exons do not have sequences!
Chromosome names missing in 'reference sequence' file:  , , , , , , , , , , , , , , , , , , , 
Chromosome names missing in 'genes' file             :  'tb50_10_tig00000022_BK006943''tb50_11_tig00000025_BK006944''tb50_12_1_tig00000134_BK006945''tb50_12_2_tig00000133_BK006945''tb50_12_3_tig00000132_BK006945''tb50_12_tig00000004_BK006945''tb50_13_tig00000016_BK006946''tb50_14_tig00000020_BK006947''tb50_15_tig00000013_BK006948''tb50_16_tig00000014_BK006949''tb50_1_tig00000039_BK006935''tb50_2_tig00000018_BK006936''tb50_3_tig00000035_BK006937''tb50_4_tig00000001_BK006938''tb50_5_tig00000029_BK006939''tb50_6_tig00000037_BK006940''tb50_7_tig00000015_BK006941''tb50_8_tig00000027_BK006934''tb50_9_tig00000033_BK006942''tb50_Mito_tig00000041_KP263414'

. File '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' line 13047 'tb50_Mito_tig00000041_KP263414 exonerate:protein2genome:local match_part 17983 18349 . + . Parent=match00344'

at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.error(SnpEffPredictorFactory.java:421)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:556)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)

java.lang.RuntimeException: Error reading file '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' java.lang.RuntimeException: FATAL ERROR: Most Exons do not have sequences! Chromosome names missing in 'reference sequence' file: , , , , , , , , , , , , , , , , , , , Chromosome names missing in 'genes' file : 'tb50_10_tig00000022_BK006943''tb50_11_tig00000025_BK006944''tb50_12_1_tig00000134_BK006945''tb50_12_2_tig00000133_BK006945''tb50_12_3_tig00000132_BK006945''tb50_12_tig00000004_BK006945''tb50_13_tig00000016_BK006946''tb50_14_tig00000020_BK006947''tb50_15_tig00000013_BK006948''tb50_16_tig00000014_BK006949''tb50_1_tig00000039_BK006935''tb50_2_tig00000018_BK006936''tb50_3_tig00000035_BK006937''tb50_4_tig00000001_BK006938''tb50_5_tig00000029_BK006939''tb50_6_tig00000037_BK006940''tb50_7_tig00000015_BK006941''tb50_8_tig00000027_BK006934''tb50_9_tig00000033_BK006942''tb50_Mito_tig00000041_KP263414'

. File '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' line 13047 'tb50_Mito_tig00000041_KP263414 exonerate:protein2genome:local match_part 17983 18349 . + . Parent=match00344'

at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)

My sequences.fa has headers like this:

>tb50_4_tig00000001_BK006938
>tb50_12_tig00000004_BK006945

And my genes.gff looks like this:

tb50_4_tig00000001_BK006938 exonerate:protein2genome:local  match   771461  771859  718 +   . ID=match03090;Name=sp|A0A023PZE8|YD57W_YEAST;Target=sp|A0A023PZE8|YD57W_YEAST 1 133;Gap=M399
tb50_4_tig00000001_BK006938 exonerate:protein2genome:local  match_part  771461  771859  .   +   .   Parent=match03090

So for me, the headers are correct and same in both files... Where am I wrong???

I also checked this error in the snpEff website (http://snpeff.sourceforge.net/SnpEff_manual.html#trouble) and I followed the intructions to add the sequences to the gff3 file but still same error...

Please help!

Thanks a lot!!! Trini

SNP software error Assembly • 2.4k views
ADD COMMENT
0
Entering edit mode

For GFF3 annotations downloaded from Ensembl (which should meet all GFF3 standard and pass GFF3 validations), I used "-gtf22" option instead of "-gff3" and it worked fine in snpEff.

ADD REPLY
0
Entering edit mode

Hello, I created the genes.gft file and run it as you said but still same problem... Thanks anyways

ADD REPLY

Login before adding your answer.

Traffic: 2064 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6