Hello all!!!
I have sequenced the strain we have in the lab and now I want to use that assembly to make snp calling against other strains. The problem is while building the database with snpEff, I am always having this same error and I don't understand why :
java -jar snpEff.jar build -gff3 -v TB50
00:00:00 SnpEff version SnpEff 4.3t (build 2017-11-24 10:18), by Pablo Cingolani 00:00:00 Command: 'build' 00:00:00 Building database for 'TB50' 00:00:00 Reading configuration file 'snpEff.config'. Genome: 'TB50' 00:00:00 Reading config file: /Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/snpEff.config 00:00:00 done Reading GFF3 data file : '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' Total: 0 markers added.
Create exons from CDS (if needed):
Exons created for 0 transcripts.
Deleting redundant exons (if needed):
Total transcripts with deleted exons: 0
Collapsing zero length introns (if needed):
Total collapsed transcripts: 0
Reading sequences :
FASTA file: '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/genomes/TB50.fa' not found.
Reading FASTA file: '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/sequences.fa'
Reading sequence 'tb50_4_tig00000001_BK006938', length: 1527445
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_12_tig00000004_BK006945', length: 72496
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_15_tig00000013_BK006948', length: 1102713
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_16_tig00000014_BK006949', length: 941389
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_7_tig00000015_BK006941', length: 1091053
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_13_tig00000016_BK006946', length: 912520
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_2_tig00000018_BK006936', length: 832510
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_14_tig00000020_BK006947', length: 770263
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_10_tig00000022_BK006943', length: 744706
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_11_tig00000025_BK006944', length: 680926
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_8_tig00000027_BK006934', length: 576431
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_5_tig00000029_BK006939', length: 604804
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_9_tig00000033_BK006942', length: 425111
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_3_tig00000035_BK006937', length: 329289
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_6_tig00000037_BK006940', length: 266054
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_1_tig00000039_BK006935', length: 246943
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_Mito_tig00000041_KP263414', length: 84520
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_12_3_tig00000132_BK006945', length: 683993
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_12_2_tig00000133_BK006945', length: 88102
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Reading sequence 'tb50_12_1_tig00000134_BK006945', length: 482874
Adding genomic sequences to exons: Done (0 sequences added, 0 ignored).
Total: 0 sequences added, 0 sequences ignored.
Adjusting transcripts:
Adjusting genes:
Adjusting chromosomes lengths:
Ranking exons:
Create UTRs from CDS (if needed):
Correcting exons based on frame information.
Remove empty chromosomes:
Marking as 'coding' from CDS information:
Done: 0 transcripts markedjava.lang.RuntimeException: FATAL ERROR: Most Exons do not have sequences!
Chromosome names missing in 'reference sequence' file: , , , , , , , , , , , , , , , , , , ,
Chromosome names missing in 'genes' file : 'tb50_10_tig00000022_BK006943''tb50_11_tig00000025_BK006944''tb50_12_1_tig00000134_BK006945''tb50_12_2_tig00000133_BK006945''tb50_12_3_tig00000132_BK006945''tb50_12_tig00000004_BK006945''tb50_13_tig00000016_BK006946''tb50_14_tig00000020_BK006947''tb50_15_tig00000013_BK006948''tb50_16_tig00000014_BK006949''tb50_1_tig00000039_BK006935''tb50_2_tig00000018_BK006936''tb50_3_tig00000035_BK006937''tb50_4_tig00000001_BK006938''tb50_5_tig00000029_BK006939''tb50_6_tig00000037_BK006940''tb50_7_tig00000015_BK006941''tb50_8_tig00000027_BK006934''tb50_9_tig00000033_BK006942''tb50_Mito_tig00000041_KP263414'
. File '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' line 13047 'tb50_Mito_tig00000041_KP263414 exonerate:protein2genome:local match_part 17983 18349 . + . Parent=match00344'
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.error(SnpEffPredictorFactory.java:421)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactory.finishUp(SnpEffPredictorFactory.java:556)
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:348)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
java.lang.RuntimeException: Error reading file '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' java.lang.RuntimeException: FATAL ERROR: Most Exons do not have sequences! Chromosome names missing in 'reference sequence' file: , , , , , , , , , , , , , , , , , , , Chromosome names missing in 'genes' file : 'tb50_10_tig00000022_BK006943''tb50_11_tig00000025_BK006944''tb50_12_1_tig00000134_BK006945''tb50_12_2_tig00000133_BK006945''tb50_12_3_tig00000132_BK006945''tb50_12_tig00000004_BK006945''tb50_13_tig00000016_BK006946''tb50_14_tig00000020_BK006947''tb50_15_tig00000013_BK006948''tb50_16_tig00000014_BK006949''tb50_1_tig00000039_BK006935''tb50_2_tig00000018_BK006936''tb50_3_tig00000035_BK006937''tb50_4_tig00000001_BK006938''tb50_5_tig00000029_BK006939''tb50_6_tig00000037_BK006940''tb50_7_tig00000015_BK006941''tb50_8_tig00000027_BK006934''tb50_9_tig00000033_BK006942''tb50_Mito_tig00000041_KP263414'
. File '/Users/tmartinc/work/RL/PhD/Database/TB50seqLausanne/SNPs/snpEff/snpEff_latest_core/snpEff/./data/TB50/genes.gff' line 13047 'tb50_Mito_tig00000041_KP263414 exonerate:protein2genome:local match_part 17983 18349 . + . Parent=match00344'
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:369)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
My sequences.fa has headers like this:
>tb50_4_tig00000001_BK006938
>tb50_12_tig00000004_BK006945
And my genes.gff looks like this:
tb50_4_tig00000001_BK006938 exonerate:protein2genome:local match 771461 771859 718 + . ID=match03090;Name=sp|A0A023PZE8|YD57W_YEAST;Target=sp|A0A023PZE8|YD57W_YEAST 1 133;Gap=M399
tb50_4_tig00000001_BK006938 exonerate:protein2genome:local match_part 771461 771859 . + . Parent=match03090
So for me, the headers are correct and same in both files... Where am I wrong???
I also checked this error in the snpEff website (http://snpeff.sourceforge.net/SnpEff_manual.html#trouble) and I followed the intructions to add the sequences to the gff3 file but still same error...
Please help!
Thanks a lot!!! Trini
For GFF3 annotations downloaded from Ensembl (which should meet all GFF3 standard and pass GFF3 validations), I used "-gtf22" option instead of "-gff3" and it worked fine in snpEff.
Hello, I created the genes.gft file and run it as you said but still same problem... Thanks anyways