Hi all,
I want to add locations of transcription factor binding sites to a pre-built genome (Arabidopsis_thaliana). I am following instructions from the snpEff site (http://pcingola.github.io/SnpEff/se_build_reg/), but I am getting errors about lack of .bin
files. It seems they might only be available for some pre-built genomes (http://pcingola.github.io/SnpEff/se_additionalann/), but is there a way for me to create these files for my purpose?
I provided a BED file to a folder here:
snpeff-5.1-2/data/Arabidopsis_thaliana/regulation.bed/regulation.plant.dapseq_peaks.bed
and run this command: snpEff build -v -onlyReg Arabidopsis_thaliana
But get this error:
00:00:00 SnpEff version SnpEff 5.1d (build 2022-04-19 15:49), by Pablo Cingolani
00:00:00 Command: 'build'
00:00:00 Building database for 'Arabidopsis_thaliana'
00:00:00 Reading configuration file 'snpEff.config'. Genome: 'Arabidopsis_thaliana'
00:00:00 Reading config file: /home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/data/Arabidopsis_thaliana/snpEff.config
00:00:00 Reading config file: /home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/snpEff.config
00:00:01 done
00:00:01 [Optional] Reading regulation elements: GFF
WARNING_FILE_NOT_FOUND: Cannot read optional regulation file '/home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/./data/Arabidopsis_thaliana/regulation.gff', nothing done.
00:00:01 [Optional] Reading regulation elements: BED
00:00:01 Directory has 2 bed files and 1 cell types
00:00:01 Creating consensus for cellType 'plant', files: [/home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/./data/Arabidopsis_thaliana/regulation.bed//regulation.plant.dapseq_peaks.bed.bkp, /home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/./data/Arabidopsis_thaliana/regulation.bed//regulation.plant.dapseq_peaks.bed]
00:00:01 Reading file '/home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/./data/Arabidopsis_thaliana/regulation.bed//regulation.plant.dapseq_peaks.bed.bkp'
00:00:01 Adding regulatory type: 'plant'
00:00:03 Done
Total lines : 2816462
Total annotation count : 169645
Percent : 6.0%
Total annotated length : 34278160
Number of cell/annotations : 1
00:00:03 Reading file '/home/msimenc/software/mambaforge/envs/gwas_test/share/snpeff-5.1-2/./data/Arabidopsis_thaliana/regulation.bed//regulation.plant.dapseq_peaks.bed'
00:00:05 Done
Total lines : 2816462
Total annotation count : 339292
Percent : 6.0%
Total annotated length : 68556724
Number of cell/annotations : 1
00:00:05 Creating consensus for cell type: plant
00:00:05 Sorting: plant , size: 339292
00:00:06 Adding to final consensus
00:00:06 Final consensus for cell type: plant , size: 169640
java.lang.RuntimeException: java.io.FileNotFoundException: null/regulation_plant.bin (No such file or directory)
at org.snpeff.serializer.MarkerSerializer.save(MarkerSerializer.java:311)
at org.snpeff.interval.Markers.save(Markers.java:399)
at org.snpeff.RegulationFileConsensus.save(RegulationFileConsensus.java:164)
at org.snpeff.RegulationConsensusMultipleBed.consensusByRegType(RegulationConsensusMultipleBed.java:69)
at org.snpeff.RegulationConsensusMultipleBed.run(RegulationConsensusMultipleBed.java:139)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.readRegulationBed(SnpEffCmdBuild.java:330)
at org.snpeff.snpEffect.commandLine.SnpEffCmdBuild.run(SnpEffCmdBuild.java:441)
at org.snpeff.SnpEff.run(SnpEff.java:1141)
at org.snpeff.SnpEff.main(SnpEff.java:160)
Caused by: java.io.FileNotFoundException: null/regulation_plant.bin (No such file or directory)
at java.base/java.io.FileOutputStream.open0(Native Method)
at java.base/java.io.FileOutputStream.open(FileOutputStream.java:298)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:237)
at java.base/java.io.FileOutputStream.<init>(FileOutputStream.java:126)
at org.snpeff.serializer.MarkerSerializer.save(MarkerSerializer.java:300)
... 8 more
00:00:06 Logging
00:00:07 Checking for updates...
00:00:08 Done.
Any help with adding custom annotations to Arabidopsis_thaliana database would be much appreciated!