Hello,
I am using snpEff to build a genome database. When I execute the following command:
java -jar snpEff.jar build -gff3 -v Sorghum
it appears that the program is running.
This information begins scrolling through the screen:
00:00:00 SnpEff version SnpEff 4.3p (build 2017-06-06 09:55), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'Sorghum'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'Sorghum'
00:00:00 Reading config file: /home/.conda/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/snpEff.config
00:00:01 done
Reading GFF3 data file : '/home/.conda/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/data/Sorghum/genes.gff'
And then it continues adding genomic sequences.
I do get some of the following warning messages:
WARNING: Cannot find last exonic position before 1748688 for transcript 'Sobic.002G018901.1.v3.2'
WARNING: Cannot find last exonic position before 42880109 for transcript 'Sobic.003G175350.1.v3.2'
WARNING: Cannot find first exonic position after 117023 for transcript 'Sobic.002G000350.1.v3.2'
And towards the end of reading sequences and adding sequences I get the following information:
java.lang.RuntimeException: Error reading file '/home/.conda/pkgs/snpeff-4.3.1p-1/share/snpeff-4.3.1p-1/data/Sorghum/genes.gff'
at org.snpeff.snpEffect.factory.SnpEffPredictorFactoryGff.create(SnpEffPredictorFactoryGff.java:353)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.createSnpEffPredictor(SnpEffCmdBuild.java:118)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:362)
at org.snpeff.SnpEff.run(SnpEff.java:1183)
at org.snpeff.SnpEff.main(SnpEff.java:162)
00:00:30 Logging
00:00:31 Checking for updates...
00:00:32 Done.
Finally, how I think I detected the problem is that the 'snpEffectorPredictor.bin' is never created which is needed for downstream applications.
Is anyone familiar with this problem? Am I not completing the build? Or is it failing. I have a hard time believing it is my gff file since the initial start of the program seems to be working with no errors.
Thank you Hannah
Not a direct solution but sorghum genome is supported.
downloaded the bin in the data directory. Is there a specific reason that you want to build it yourself ?
My vcf file is based on the most recent Sorghum genome assembly while older versions are supported in snpeff. I think at this point I am going to redo the analysis using an older version of the genome.
Thanks