I am trying to run CEGMA
on the newly assembled genome (scaffolds) and I have trouble getting past the geneid
step. I ran CEGMA
with default parameters, as cegma --ext -g genome.scf.fasta
. The pipeline ran for about 10 hours (32 procs, 256 GB RAM) and exited giving this: CEGMA, geneid error: geneid-train did not work properly
. When I investigated, I found it was geneid-train
step. So, I tried to run it manually as:
$ geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params
DATA COLLECTED: 298 Coding sequences containing 1311 introns
Intron model
Coding model
Use of uninitialized value in numeric eq (==) at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 264.
some values in Markov model with zero counts, use pseudocounts at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 270.
Does anybody have any experience with geneid
? How can I get past this step? My genome's estimated size is 745 MB and has about 574K scaffolds (643 MB total length), N50=607.
Any input will greatly be appreciated!
Thanks
Your N50 is 607 bp?!? Or did you mean 607 Kbp. If it really is 607 bp then you should probably not be doing anything else with that genome assembly.
LOL, no! It was the preliminary assembly. Later it was improved and the numbers are much better now.
Still a work in progress: