Error While Running Cegma (Geneid-Train Step)
3
0
Entering edit mode
10.6 years ago
arnstrm ★ 1.9k

I am trying to run CEGMA on the newly assembled genome (scaffolds) and I have trouble getting past the geneid step. I ran CEGMA with default parameters, as cegma --ext -g genome.scf.fasta. The pipeline ran for about 10 hours (32 procs, 256 GB RAM) and exited giving this: CEGMA, geneid error: geneid-train did not work properly. When I investigated, I found it was geneid-train step. So, I tried to run it manually as:

$ geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params
DATA COLLECTED: 298 Coding sequences containing 1311 introns
Intron model
Coding model 
Use of uninitialized value in numeric eq (==) at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 264.
some values in Markov model with zero counts, use pseudocounts at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 270.

Does anybody have any experience with geneid? How can I get past this step? My genome's estimated size is 745 MB and has about 574K scaffolds (643 MB total length), N50=607. Any input will greatly be appreciated!

Thanks

training prediction • 3.8k views
ADD COMMENT
0
Entering edit mode

Your N50 is 607 bp?!? Or did you mean 607 Kbp. If it really is 607 bp then you should probably not be doing anything else with that genome assembly.

ADD REPLY
0
Entering edit mode

LOL, no! It was the preliminary assembly. Later it was improved and the numbers are much better now.

Still a work in progress:

ADD REPLY
1
Entering edit mode
10.6 years ago
arnstrm ★ 1.9k

This has been resolved on this Trouble running CEGMA on the sample dataset!

ADD COMMENT
0
Entering edit mode
10.6 years ago
keith ▴ 130

This hasn't actually been resolved yet...I'm still working on what might be going on.

ADD COMMENT
0
Entering edit mode
9.4 years ago

Hi Keith and arnstrm,

I am not sure whether this issue has been sorted but I can try to help if it hasn't. The CEGMA pipeline was developed by from a group other than ours. However it uses geneid, which an ab initio gene prediction tool developed in our group.

I have used CEGMA quite extensively to determine the quality of the protein-coding "gene-space" of different genomes but I don think I have run into this sort of problem...

It sounds like some of your input fastas may have no sequence content but I am not sure..Could you give me access to the intermediate files including the geneid parameter file (self.param) so that I could try to figure out what's wrong?

I assume I have the latest version of geneid installed on the system (1.4.4 -check here: http://genome.crg.es/software/geneid/index.html) and/or your CEGMA installation is pointing to it.

Thanks,
Francisco Camara

ADD COMMENT
0
Entering edit mode

Thanks, Francisco! The issue has been resolved in version >2.5 (see this post for details). Keith (as in Keith Bradnam) is one of the CEGMA developer!

EDIT: didn't realize, I was talking to geneid developer as well. I am so happy now!

ADD REPLY

Login before adding your answer.

Traffic: 2183 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6