Hi all,
I am trying to get the cegma working on the sample dataset that came along with the package. I have already installed all the pre-requistes (ncbi-blast 2.2.25+
, hmmer 3.0
, geneid 1.4.4
and genewise 2.4
). When executed (as suggested in the README
file), I get the error saying that geneid-train did not work properly
. Please see the entire STDOUT
(verbose) below. Any suggestions will be greatly appreciated!
$ cegma -v -ext --genome sample.dna --protein sample.prot -o sample &>sample_data.stdout
$ cat sample_data.stdout
********************************************************************************
** MAPPING PROTEINS TO GENOME (TBLASTN) **
********************************************************************************
RUNNING: genome_map -n genome -p 6 -o 5000 -c 2000 -t 1 -v sample.prot sample.dna 2>sample.cegma.errors
Building a new DB, current time: 05/01/2014 09:10:46
New DB name: /tmp/genome25484.blastdb
New DB title: sample.dna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1 sequences in 0.07232 seconds.
Processing KOG: KOG0292
Processing KOG: KOG0328
Processing KOG: KOG0466
Processing KOG: KOG0631
Processing KOG: KOG0659
Processing KOG: KOG1458
Processing KOG: KOG1742
Processing KOG: KOG1762
Processing KOG: KOG1769
Processing KOG: KOG1780
Processing KOG: KOG1992
Processing KOG: KOG2067
Processing KOG: KOG2803
Processing KOG: KOG2916
Processing KOG: KOG3180
Processing KOG: KOG3232
Processing KOG: KOG3418
Processing KOG: KOG3497
Found 86 candidate regions in sample.dna
********************************************************************************
** MAKING INITIAL GENE PREDICTIONS FOR CORE GENES (GENEWISE + GENEID) **
********************************************************************************
RUNNING: local_map -n local -f -h /data004/software/GIF/packages/cegma/2.4.010312/data/hmm_profiles -i KOG -v genome.chunks.fa 2>sample.cegma.errors
Processing chunk: KOG0292.1
Processing chunk: KOG0292.6
Processing chunk: KOG0292.7
Processing chunk: KOG0292.2
Processing chunk: KOG0292.4
Processing chunk: KOG0328.7
Processing chunk: KOG0328.18
Processing chunk: KOG0328.16
Processing chunk: KOG0328.15
Processing chunk: KOG0328.8
Processing chunk: KOG0466.5
Processing chunk: KOG0466.4
Processing chunk: KOG0466.3
Processing chunk: KOG0466.2
Processing chunk: KOG0631.10
Processing chunk: KOG0631.6
Processing chunk: KOG0631.9
Processing chunk: KOG0631.5
Processing chunk: KOG0631.12
Processing chunk: KOG0659.19
Processing chunk: KOG0659.7
Processing chunk: KOG0659.20
Processing chunk: KOG0659.21
Processing chunk: KOG0659.10
Processing chunk: KOG1458.6
Processing chunk: KOG1458.8
Processing chunk: KOG1458.5
Processing chunk: KOG1458.3
Processing chunk: KOG1458.9
Processing chunk: KOG1742.8
Processing chunk: KOG1742.3
Processing chunk: KOG1742.9
Processing chunk: KOG1742.6
Processing chunk: KOG1742.5
Processing chunk: KOG1762.3
Processing chunk: KOG1762.2
Processing chunk: KOG1769.22
Processing chunk: KOG1769.15
Processing chunk: KOG1769.8
Processing chunk: KOG1769.5
Processing chunk: KOG1769.27
Processing chunk: KOG1780.5
Processing chunk: KOG1780.2
Processing chunk: KOG1780.7
Processing chunk: KOG1780.3
Processing chunk: KOG1780.6
Processing chunk: KOG1992.1
Processing chunk: KOG1992.8
Processing chunk: KOG1992.2
Processing chunk: KOG1992.3
Processing chunk: KOG1992.6
Processing chunk: KOG2067.6
Processing chunk: KOG2067.7
Processing chunk: KOG2067.3
Processing chunk: KOG2067.9
Processing chunk: KOG2067.8
Processing chunk: KOG2803.7
Processing chunk: KOG2803.5
Processing chunk: KOG2803.8
Processing chunk: KOG2803.4
Processing chunk: KOG2803.6
Processing chunk: KOG2916.9
Processing chunk: KOG2916.4
Processing chunk: KOG2916.11
Processing chunk: KOG2916.10
Processing chunk: KOG2916.2
Processing chunk: KOG3180.12
Processing chunk: KOG3180.10
Processing chunk: KOG3180.3
Processing chunk: KOG3180.7
Processing chunk: KOG3180.9
Processing chunk: KOG3232.6
Processing chunk: KOG3232.2
Processing chunk: KOG3232.4
Processing chunk: KOG3232.3
Processing chunk: KOG3232.5
Processing chunk: KOG3418.12
Processing chunk: KOG3418.10
Processing chunk: KOG3418.3
Processing chunk: KOG3418.9
Processing chunk: KOG3418.8
Processing chunk: KOG3497.9
Processing chunk: KOG3497.1
Processing chunk: KOG3497.10
Processing chunk: KOG3497.2
Processing chunk: KOG3497.11
NOTE: created 23 geneid predictions
********************************************************************************
** FILTERING INITIAL PROTEINS PRODUCED BY GENEID (HMMER) **
********************************************************************************
RUNNING: hmm_select -i KOG -o local -t 1 -v /data004/software/GIF/packages/cegma/2.4.010312/data/hmm_profiles local.geneid.fa /data004/software/GIF/packages/cegma/2.4.010312/data/profiles_cutoff.tbl 2>sample.cegma.errors
Processing geneid prediction: KOG0292.6
Processing geneid prediction: KOG0292.7
Processing geneid prediction: KOG0328.7
Processing geneid prediction: KOG0328.18
Processing geneid prediction: KOG0328.8
Processing geneid prediction: KOG0466.5
Processing geneid prediction: KOG0631.10
Processing geneid prediction: KOG0659.19
Processing geneid prediction: KOG0659.7
Processing geneid prediction: KOG0659.20
Processing geneid prediction: KOG1742.8
Processing geneid prediction: KOG1762.3
Processing geneid prediction: KOG1769.22
Processing geneid prediction: KOG1992.1
Processing geneid prediction: KOG2067.6
Processing geneid prediction: KOG2067.7
Processing geneid prediction: KOG2803.7
Processing geneid prediction: KOG2803.8
Processing geneid prediction: KOG2916.9
Processing geneid prediction: KOG3180.12
Processing geneid prediction: KOG3232.6
Processing geneid prediction: KOG3418.12
Processing geneid prediction: KOG3497.9
NOTE: Found 15 geneid predictions with scores above threshold value
********************************************************************************
** CALCULATING GENEID PARAMETERS FROM SELECTED GENEID PREDICTIONS **
********************************************************************************
RUNNING: geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params 2>sample.cegma.errors
DATA COLLECTED: 15 Coding sequences containing 48 introns
Intron model
geneid-train did not work properly
EDIT:
So, when I just run the final step:
$ geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params
I get:
DATA COLLECTED: 15 Coding sequences containing 48 introns
Intron model
Use of uninitialized value in numeric eq (==) at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 264.
some values in Markov model with zero counts, use pseudocounts at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 270.
Thanks, Dr. Bradnam! I've updated my question with just the
geneid-train
command output as well.