Question

Trouble running CEGMA on the sample dataset

1

Entering edit mode

11.0 years ago

arnstrm ★ 1.9k

Hi all,

I am trying to get the cegma working on the sample dataset that came along with the package. I have already installed all the pre-requistes (ncbi-blast 2.2.25+, hmmer 3.0, geneid 1.4.4 and genewise 2.4). When executed (as suggested in the README file), I get the error saying that geneid-train did not work properly. Please see the entire STDOUT (verbose) below. Any suggestions will be greatly appreciated!

$ cegma -v -ext --genome sample.dna --protein sample.prot -o sample &>sample_data.stdout
$ cat sample_data.stdout

********************************************************************************
**                    MAPPING PROTEINS TO GENOME (TBLASTN)                    **
********************************************************************************
RUNNING: genome_map  -n genome -p 6 -o 5000 -c 2000 -t 1  -v  sample.prot sample.dna 2>sample.cegma.errors

Building a new DB, current time: 05/01/2014 09:10:46
New DB name:   /tmp/genome25484.blastdb
New DB title:  sample.dna
Sequence type: Nucleotide
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1073741824B
Adding sequences from FASTA; added 1 sequences in 0.07232 seconds.
Processing KOG: KOG0292
Processing KOG: KOG0328
Processing KOG: KOG0466
Processing KOG: KOG0631
Processing KOG: KOG0659
Processing KOG: KOG1458
Processing KOG: KOG1742
Processing KOG: KOG1762
Processing KOG: KOG1769
Processing KOG: KOG1780
Processing KOG: KOG1992
Processing KOG: KOG2067
Processing KOG: KOG2803
Processing KOG: KOG2916
Processing KOG: KOG3180
Processing KOG: KOG3232
Processing KOG: KOG3418
Processing KOG: KOG3497
Found 86 candidate regions in sample.dna

********************************************************************************
**     MAKING INITIAL GENE PREDICTIONS FOR CORE GENES (GENEWISE + GENEID)     **
********************************************************************************

RUNNING: local_map -n local -f -h /data004/software/GIF/packages/cegma/2.4.010312/data/hmm_profiles -i KOG -v  genome.chunks.fa 2>sample.cegma.errors
Processing chunk: KOG0292.1
Processing chunk: KOG0292.6
Processing chunk: KOG0292.7
Processing chunk: KOG0292.2
Processing chunk: KOG0292.4
Processing chunk: KOG0328.7
Processing chunk: KOG0328.18
Processing chunk: KOG0328.16
Processing chunk: KOG0328.15
Processing chunk: KOG0328.8
Processing chunk: KOG0466.5
Processing chunk: KOG0466.4
Processing chunk: KOG0466.3
Processing chunk: KOG0466.2
Processing chunk: KOG0631.10
Processing chunk: KOG0631.6
Processing chunk: KOG0631.9
Processing chunk: KOG0631.5
Processing chunk: KOG0631.12
Processing chunk: KOG0659.19
Processing chunk: KOG0659.7
Processing chunk: KOG0659.20
Processing chunk: KOG0659.21
Processing chunk: KOG0659.10
Processing chunk: KOG1458.6
Processing chunk: KOG1458.8
Processing chunk: KOG1458.5
Processing chunk: KOG1458.3
Processing chunk: KOG1458.9
Processing chunk: KOG1742.8
Processing chunk: KOG1742.3
Processing chunk: KOG1742.9
Processing chunk: KOG1742.6
Processing chunk: KOG1742.5
Processing chunk: KOG1762.3
Processing chunk: KOG1762.2
Processing chunk: KOG1769.22
Processing chunk: KOG1769.15
Processing chunk: KOG1769.8
Processing chunk: KOG1769.5
Processing chunk: KOG1769.27
Processing chunk: KOG1780.5
Processing chunk: KOG1780.2
Processing chunk: KOG1780.7
Processing chunk: KOG1780.3
Processing chunk: KOG1780.6
Processing chunk: KOG1992.1
Processing chunk: KOG1992.8
Processing chunk: KOG1992.2
Processing chunk: KOG1992.3
Processing chunk: KOG1992.6
Processing chunk: KOG2067.6
Processing chunk: KOG2067.7
Processing chunk: KOG2067.3
Processing chunk: KOG2067.9
Processing chunk: KOG2067.8
Processing chunk: KOG2803.7
Processing chunk: KOG2803.5
Processing chunk: KOG2803.8
Processing chunk: KOG2803.4
Processing chunk: KOG2803.6
Processing chunk: KOG2916.9
Processing chunk: KOG2916.4
Processing chunk: KOG2916.11
Processing chunk: KOG2916.10
Processing chunk: KOG2916.2
Processing chunk: KOG3180.12
Processing chunk: KOG3180.10
Processing chunk: KOG3180.3
Processing chunk: KOG3180.7
Processing chunk: KOG3180.9
Processing chunk: KOG3232.6
Processing chunk: KOG3232.2
Processing chunk: KOG3232.4
Processing chunk: KOG3232.3
Processing chunk: KOG3232.5
Processing chunk: KOG3418.12
Processing chunk: KOG3418.10
Processing chunk: KOG3418.3
Processing chunk: KOG3418.9
Processing chunk: KOG3418.8
Processing chunk: KOG3497.9
Processing chunk: KOG3497.1
Processing chunk: KOG3497.10
Processing chunk: KOG3497.2
Processing chunk: KOG3497.11
NOTE: created 23 geneid predictions

********************************************************************************
**           FILTERING INITIAL PROTEINS PRODUCED BY GENEID (HMMER)            **
********************************************************************************

RUNNING: hmm_select -i KOG -o local -t 1 -v  /data004/software/GIF/packages/cegma/2.4.010312/data/hmm_profiles local.geneid.fa /data004/software/GIF/packages/cegma/2.4.010312/data/profiles_cutoff.tbl 2>sample.cegma.errors
Processing geneid prediction: KOG0292.6
Processing geneid prediction: KOG0292.7
Processing geneid prediction: KOG0328.7
Processing geneid prediction: KOG0328.18
Processing geneid prediction: KOG0328.8
Processing geneid prediction: KOG0466.5
Processing geneid prediction: KOG0631.10
Processing geneid prediction: KOG0659.19
Processing geneid prediction: KOG0659.7
Processing geneid prediction: KOG0659.20
Processing geneid prediction: KOG1742.8
Processing geneid prediction: KOG1762.3
Processing geneid prediction: KOG1769.22
Processing geneid prediction: KOG1992.1
Processing geneid prediction: KOG2067.6
Processing geneid prediction: KOG2067.7
Processing geneid prediction: KOG2803.7
Processing geneid prediction: KOG2803.8
Processing geneid prediction: KOG2916.9
Processing geneid prediction: KOG3180.12
Processing geneid prediction: KOG3232.6
Processing geneid prediction: KOG3418.12
Processing geneid prediction: KOG3497.9
NOTE: Found 15 geneid predictions with scores above threshold value

********************************************************************************
**       CALCULATING GENEID PARAMETERS FROM SELECTED GENEID PREDICTIONS       **
********************************************************************************

RUNNING: geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params 2>sample.cegma.errors
DATA COLLECTED: 15 Coding sequences containing 48 introns
Intron model
geneid-train did not work properly

EDIT:

So, when I just run the final step:

$ geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params

I get:

DATA COLLECTED: 15 Coding sequences containing 48 introns
Intron model
Use of uninitialized value in numeric eq (==) at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 264.
some values in Markov model with zero counts, use pseudocounts at /data004/software/GIF/packages/cegma/2.4.010312/lib/geneid.pm line 270.

genewise geneid cegma • 6.2k views

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

1

Entering edit mode

11.0 years ago

keith ▴ 130

As you have run CEGMA with the --ext option to keep intermediate files, could you try running the final step separately to show what the error message from geneid-train is? i.e. run:

geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params

And then report the output.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Thanks, Dr. Bradnam! I've updated my question with just the geneid-train command output as well.

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

11.0 years ago

keith ▴ 130

Hmm, everything is looking good initially. The output you should be seeing for the gene-id step of CEGMA is something like:

RUNNING: geneid-train  local.geneid.selected.gff local.geneid.selected.dna geneid_params 2>test.cegma.errors
DATA COLLECTED: 15 Coding sequences containing 48 introns
RUNNING: make_paramfile /Korflab/Packages/CEGMA/current/data/self.param.template \
            geneid_params/coding.initial.5.logs geneid_params/coding.transition.5.logs \
            geneid_params/start.logs geneid_params/acc.logs geneid_params/don.logs \
            geneid_params/intron.max > geneid_params/self.param

And when I run the separate geneid-train step, I see the following:

geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params
DATA COLLECTED: 15 Coding sequences containing 48 introns
Intron model
Coding model
don
false_don
acc
false_acc
start
false_start

This is with geneid v1.4...I wonder if they changed something in v1.4.4 that breaks CEGMA?

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Actually, ignore my last point. I was running v1.4.4 as well (geneid only reports the version as being 1.4). What platform are you running on?

ADD REPLY • link 11.0 years ago by keith ▴ 130

0

Entering edit mode

I am on Linux hpc5 2.6.32-358.11.1.el6.x86_64 GNU/Linux, Red Hat Enterprise Linux Server release 6.4 (Santiago).

ADD REPLY • link 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

Thanks very much for the reply. Yes, I too wondered about the differences between 1.4 (mentioned in CEGMA manual) and 1.4.4 available at geneid website. It seems like there isn't 1.4 version of geneid anywhere. Also, geneid-train is not from the geneid program, it is a part of GEGMA, right?

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

11.0 years ago

keith ▴ 130

I don't have much left to suggest, but let's check that the required input files for geneid-train look okay. Can you run the following please:

ls -l local.geneid.selected.gff local.geneid.selected.dna geneid_params
head local.geneid.selected.gff local.geneid.selected.dna

And then print the output here.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Thank you very much for you help! Here are the outputs for the commands:

$ ls -l

output:

total 4188
drwxrwxr-x 2 arnstrm GIF      23 May  1 09:18 geneid_params
-rw-rw-r-- 1 arnstrm GIF   58468 May  1 09:13 genome.blast
-rw-rw-r-- 1 arnstrm GIF   49140 May  1 09:13 genome.blast.gff
-rw-rw-r-- 1 arnstrm GIF  437782 May  1 09:13 genome.chunks.fa
-rw-rw-r-- 1 arnstrm GIF    5232 May  1 09:13 genome.chunks.gff
-rw-rw-r-- 1 arnstrm GIF     511 May  1 09:18 local.extended.tbl
-rw-rw-r-- 1 arnstrm GIF   26461 May  1 09:17 local.geneid
-rw-rw-r-- 1 arnstrm GIF    7424 May  1 09:17 local.geneid.fa
-rw-rw-r-- 1 arnstrm GIF   16794 May  1 09:17 local.geneid.gff
-rw-rw-r-- 1 arnstrm GIF   99177 May  1 09:18 local.geneid.selected.dna
-rw-rw-r-- 1 arnstrm GIF    5283 May  1 09:18 local.geneid.selected.fa
-rw-rw-r-- 1 arnstrm GIF    8385 May  1 09:18 local.geneid.selected.gff
-rw-rw-r-- 1 arnstrm GIF     155 May  1 09:18 local.geneid.selected.id
-rw-rw-r-- 1 arnstrm GIF  820630 May  1 09:17 local.genewise
-rw-rw-r-- 1 arnstrm GIF    4575 May  1 09:17 local.genewise.gff
-rw-rw-r-- 1 arnstrm GIF  104931 May  1 09:18 local.hmm_select.aln
drwxrwxr-x 2 arnstrm GIF       6 Apr 30 10:26 sample
-rw-rw-r-- 1 arnstrm GIF     259 May  1 09:18 sample.cegma.errors
-rw-r--r-- 1 arnstrm GIF 2549974 Jan  4  2012 sample.dna
-rw-r--r-- 1 arnstrm GIF   42025 Jan  4  2012 sample.prot

First few lines:

head local.geneid.selected.gff

output:

# Sequence KOG0328.7 - Length = 7702 bps
KOG0328.7       geneid_v1.4     First   466     566     25.57   +       0       KOG0328.7_1
KOG0328.7       geneid_v1.4     Exon    466     566     25.57   +       .       KOG0328.7_1
KOG0328.7       geneid_v1.4     Internal        2003    2909    670.58  +       1       KOG0328.7_1
KOG0328.7       geneid_v1.4     Exon    2003    2909    670.58  +       .       KOG0328.7_1
KOG0328.7       geneid_v1.4     Terminal        5515    5706    145.45  +       0       KOG0328.7_1
KOG0328.7       geneid_v1.4     Exon    5515    5706    145.45  +       .       KOG0328.7_1
# Sequence KOG0466.5 - Length = 6383 bps
KOG0466.5       geneid_v1.4     First   1938    1947    -3.20   +       0       KOG0466.5_1
KOG0466.5       geneid_v1.4     Exon    1938    1947    -3.20   +       .       KOG0466.5_1

For the sequence file

head local.geneid.selected.dna
>KOG0328.7
ATGTTTTTCGAATTTCGAATTTTAGTTTCGAATCGAATCCTACGAATTATGAGCGAAAAT
AGCCGATAAATCACCATTTTTCTTGATTTTTTCGATGTAAATTCGTTTATTTTGTAAATT
TCCACCTGAAAATGCTGAATTTCTCTGCAAAAATGCTCGAAAAACAGGGTATTTTCGAAA
ACATTCGAAATCTTGCTCTTTTACTAGTTTTTTATCGATTTTACAGGTTTTTCTTAATTT
TTTCGATGTAAATTCACTTATTTTGTTAATTTCCTCCTAAAAATCTTGAATTTCTCTGCA
AAATGCCCGAAAATAATCAATTTTATCGTTTTTACACGTTTTTCTTGATTTTTTCGATGT
AAATTCATTGATTTTGATAATTTCCACCTAAAAATGCTGAATTTCTCTGCAAAAATGCTC
GAAAATATTCATTTTTATCGATTTTACAGAGTTCTTCCTTCAAAAATGGCGGAAAATAAA
AAGAAAAACGACGATATGGCGACTGTGGAGTTCGAGTCGTCTGAGGAGGTCTCAATTATC

Also, for the parameters directory:

$ ls -l geneid_params/
total 4
-rw-rw-r-- 1 arnstrm GIF 78 May  1 10:12 intron.max

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

11.0 years ago

keith ▴ 130

I'm stumped, so I can only suggest the following steps to at least try to narrow down where exactly the error is occurring.

Make a copy of the geneid-train script. E.g.
```
cp geneid-train geneid-train2
```

Add __END__ after line 92. E.g.

my $intron_initial = new geneid::SequenceModel('intron', 'FREQ', 4,
         \@intron, 10, 0);

__END__

Then test the script as before (but using your modified code):

geneid-train2 -v local.geneid.selected.gff local.geneid.selected.dna geneid_params

Do you still see the same error message? If not, move the __END__ token to after line 97. E.g.

my $intron_transition = new geneid::SequenceModel('intron', 'MM', 5,
         \@intron, 10, 0);

__END__

I'm betting that one of these two lines is producing the error message (which means digging into a very poorly documented Perl module to troubleshoot further...I didn't write this code by the way!).

I really can't imagine why everything else would work but then fail at this step. Just to double check, you didn't interrupt the code part way the first time you ran it, and then restarted?

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Thanks for the reply, I think my issue has been resolved. I grabbed a working copy of geneid-train, compiled on a different machine and did a diff between them

$ diff geneid-train geneid-train_2
1c1
< #!/bin/perl
---
> #!/data004/software/GIF/packages/perl/5.18.2/bin/perl

It seems like, when I run make (CEGMA package) it would add this non-conventional shebang line to all perl scripts. Somehow, geneid-train was the only one that got affected by this. Once I changed it back to #!/bin/perl, it is running fine!

Thanks once again for your valuable time! I greatly appreciate your help!

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

11.0 years ago

keith ▴ 130

Hmm, I'm pretty stumped by this. Not sure why everything would work until that one specific part of the geneid-train script. The only thing I can think of is to really nail down as to which part of the script is causing the problem. My suggestion:

Create a copy of the geneid-train script (e.g. cp /path/to/geneid-train geneid-train2)

Add an __END__ token around line 94. E.g.

my $intron_initial = new geneid::SequenceModel('intron', 'FREQ', 4,
         \@intron, 10, 0);

__END__

Run this new script:

./geneid-train2 -v local.geneid.selected.gff local.geneid.selected.dna geneid_params

If this doesn't produce an error, move the __END__ token down a few lines. E.g.

my $intron_transition = new geneid::SequenceModel('intron', 'MM', 5,
        \@intron, 10, 0);
__END__

Save and run script again. I'm betting that one of these two lines will report the error. Then we have to try to fathom out what is happening in the not-very-easy-to-understand geneid Perl module.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Adding __END__ at 94th line didn't make any difference, but at line 91 made it go away.

# compositional markov models
print "Intron model\n" if ($opt_v);

__END__

my $intron_initial = new geneid::SequenceModel('intron', 'FREQ', 4,
                        \@intron, 10, 0);

# this might be the offending line!

Thanks,

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

0

Entering edit mode

11.0 years ago

keith ▴ 130

Hmm, not for the first time an answer I posted to this thread failed to post (or was removed?). 2nd attempt...

I think I have an answer. To test it, you'll need to change one line of code in two different Perl module (it's the same line of code). Make backups of geneid.pm and HMMstar.pm (in CEGMA/lib directory). Then change one line in the dna_word_table subroutine (this is line 41 in HMMstar.pm) as follows:

OLD:        $code .= "$tab foreach my \$c$i qw(A C G T) {\n";
NEW:        $code .= "$tab foreach my \$c$i (qw(A C G T)) {\n";

The addition of the 2nd set of parentheses might make all the difference and remove your problem. Please let me know if this works.

ADD COMMENT • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by keith ▴ 130

0

Entering edit mode

Hi Dr. Bradnam, Sorry for the delay, yes this fixed the problem. I was able to complete the analyses with the sample data! Thanks once again!

Edit:

BTW, your comment wasn't deleted. The organization of biostars is quite different from traditional forum. Each reply is considered as a separate answer and each answer is moved up/down depending on up votes it gets. Someone upvoted your latest reply and it is now ranked as the second best answer!

(this entire thread would normally be a single answer with subsequent replys added as comments to it)

ADD REPLY • link updated 5.4 years ago by Ram 45k • written 11.0 years ago by arnstrm ★ 1.9k

1

Entering edit mode

Ah so that's where it went! Thanks for letting me know. I guess this merits a small release to the CEGMA code (and I was hoping to not have to touch CEGMA v2 again!).

ADD REPLY • link 11.0 years ago by keith ▴ 130

0

Entering edit mode

10.1 years ago

suzanne.mcgaugh ▴ 60

Hi,

I'm using ncbi_blast+/2.2.28, hmmer/3.0, geneid 1.4.4, genewise 2.4.1, and perl 5.10.1. Everything works fine with the sample file, but I am getting the same error as above with my datafile which is a transcriptome of 46,474 ORFs. I applied the fix above, to geneid.pm and HMMstar.pm even though I am not using perl 5.14 (though I've tried the sample file using both perl 5.14 and 5.10 and it works with both even with the fix to geneid.pm and HMMstar.pm).

Here is a sample of my input file:

>m.10939
TGTCAGGgggTGGgggAGAGGTTTCCTGGCCGCTTCTGTGGGCCACCGTCGGTGTCGTCCTGCAGCTGCCCTCCTTTGCAGCCAGAGCCACTGGCACTGTCCCCGAGAAGGCCCTCCCCTCCCGTGGGCTCCCTCCTCTGCCTGCAGCTCTTTCGTTGtgtgAGGgggAGCCCGCCcccTGTGGAAGGCATGCACCTGACCCACATATTTCCTCTGAGCCCTGCCTTGCTCCTACAGTCCCCACCTGGggggggAGCAGAGGACCTTCCTCCCTctctTATTCTTtttCCTTCCTTCCTT
>m.27653
ATGTTTGCAGGAGATGAGCTGGCAGTAGTACAGAGCTGGGAAGCCCCTTCAGTGGCAGATTCAATGGAGACAGGCAAGGTTTGCACTATGGCCATGATGGTCACAGTAGAGACTTTGCAAGTGATCTCTGTTTATCCACTACAGGCAGGTGAGAAGAGTCCACTGGCCTTCAATGATGTGAAAAGGGGAGCCTCCCTCGCGAATATGAGCCTCAGGACAGCCGATTCCAGTGTCCGGCTCCCGTCGCCCACCCCATTCGACTGCCGCCCCAACTTCTCTTCATGCTCTGTCTCAGATTCA
>m.48642
ATGTTCAACAAATCACAATACTTCTCTGCTGTTATTGCATCCACAGAATGTCACCCAGTGGCCCTGTTTAGGGTTACCTGGACAGTTCTCAAGAAAATGAGCCCAGAGGCCCATCTGCAGGGCCATGTGGAGGAATTTTCCATGCATCTGGAGGAAaaaaTCACTCTGATTCATTCCCAGTTGGACGTGAAGCATACTGTCGAGCTGTCCAGGGAGCATATTTACCCAGTTGGATCTGAGGAAGTGGACAGGGTCCTCCAGAATTTAAACATTAGATCCATACTACTCCTGTCTGGTTAA
>m.2225
AACATCTGGTACGGCTTCACAACCACCCTGCTGTCCTCCGGCGTTGACGTGGTGCTGATCGCCGCCTCTTACGCTCTGATCCTCCGGGCCGTCTTCCAGCTCTCCTGCAAGGAAGCCCGGCGGAAAGCTCTCCGCACCTGCGGCTCCCACATCTCCGTCATCCTCATGTTCTACATGCCAGCCTTCTTCTCCTTCCTCACCCATCGCTTCGGCCACAACATCCCTAGCTGGATCCATATCCTGCTGGCCGACCTCTACGTGATCGTGCCGCCCATGCTCAACCCCATTGTCTACGGAGTC
>m.21342
ATGGCAGATTCCGGAGGgggAGATAGAGTTACATCAGGGGAGACTGATACACCCCGATCCACACTGGCTACAACTAGCCATGTGGAGATTGAGCGGGAGCCCCTGAgagagGCTCACCTGTCGCTGCGGGTGGTTTGGACCATAGAGGCCTTGCACAGGAAATGCACTACGAGGATCTATGATTCCTCCTGGACTTCCTTAGTGGCCCGGTTTCAAGCAGCCCAGGTAATGCCCACCAGGCCACAGTTCCACAGGTCCTTGACTTTGTCCAGGACGAGTTGGATAAGGGGCTCTCCTTAG

Here is the command issued:

cegma -v -ext -g /panfs/roc/groups/14/mcgaughs/mcgaughs/ref.fa

********************************************************************************
**                    MAPPING PROTEINS TO GENOME (TBLASTN)                    **
********************************************************************************

RUNNING: genome_map  -n genome -p 6 -o 5000 -c 2000 -t 1  -v  /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/kogs.fa /panfs/roc/groups/14/mcgaughs/mcgaughs/ref.fa 2>output.cegma.errors
Found 2287 candidate regions in /panfs/roc/groups/14/mcgaughs/mcgaughs/ref.fa

********************************************************************************
**     MAKING INITIAL GENE PREDICTIONS FOR CORE GENES (GENEWISE + GENEID)     **
********************************************************************************

RUNNING: local_map -n local -f -h /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/hmm_profiles -i KOG -v  genome.chunks.fa 2>output.cegma.errors
NOTE: created 1551 geneid predictions

********************************************************************************
**           FILTERING INITIAL PROTEINS PRODUCED BY GENEID (HMMER)            **
********************************************************************************

RUNNING: hmm_select -i KOG -o local -t 1 -v  /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/hmm_profiles local.geneid.fa /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/profiles_cutoff.tbl 2>output.cegma.errors
NOTE: Found 417 geneid predictions with scores above threshold value

********************************************************************************
**       CALCULATING GENEID PARAMETERS FROM SELECTED GENEID PREDICTIONS       **
********************************************************************************

RUNNING: geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params 2>output.cegma.errors
RUNNING: make_paramfile /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/self.param.template \
        geneid_params/coding.initial.5.logs geneid_params/coding.transition.5.logs \
        geneid_params/start.logs geneid_params/acc.logs geneid_params/don.logs \
        geneid_params/intron.max >  geneid_params/self.param

********************************************************************************
**                           ACCURATE LOCAL MAPPING                           **
********************************************************************************

RUNNING: local_map -n local_self -g local.genewise.gff -d geneid_params/self.param -h /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/hmm_profiles -i KOG -v  genome.chunks.fa 2>output.cegma.errors
NOTE: created 0 geneid predictions

********************************************************************************
**                              FINAL FILTERING                               **
********************************************************************************

RUNNING: hmm_select -i KOG -o local_self -t 1 -v  /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/hmm_profiles local_self.geneid.fa /home/mcgaughs/mcgaughs/lcl/opt/cegma/current/data/profiles_cutoff.tbl 2>output.cegma.errors
NOTE: Foud 0 geneid predictions with scores above threshold value
sh: local_self.geneid.selected.gff: No such file or directory
Can't run sed

mcgaughs@labq03 [/panfs/roc/groups/14/mcgaughs/mcgaughs/reftwo] % geneid-train -v local.geneid.selected.gff local.geneid.selected.dna geneid_params 2>output.cegma.errors
DATA COLLECTED: 417 Coding sequences containing 679 introns 
Intron model
Coding model
don
false_don
acc
false_acc
start
false_start

mcgaughs@labq03 [/panfs/roc/groups/14/mcgaughs/mcgaughs/reftwo] % ls -l local.geneid.selected.gff local.geneid.selected.dna geneid_params
-rw-rw-r--. 1 mcgaughs mcgaughs 563370 Apr  4 12:12 local.geneid.selected.dna
-rw-rw-r--. 1 mcgaughs mcgaughs 147601 Apr  4 12:12 local.geneid.selected.gff

geneid_params:
total 1104
-rw-rw-r--. 1 mcgaughs mcgaughs    981 Apr  4 13:06 acc.logs
-rw-rw-r--. 1 mcgaughs mcgaughs  57501 Apr  4 13:06 coding.initial.5.logs
-rw-rw-r--. 1 mcgaughs mcgaughs 249017 Apr  4 13:06 coding.transition.5.logs
-rw-rw-r--. 1 mcgaughs mcgaughs    567 Apr  4 13:06 don.logs
-rw-rw-r--. 1 mcgaughs mcgaughs     76 Apr  4 13:06 intron.max
-rw-rw-r--. 1 mcgaughs mcgaughs 310082 Apr  4 12:12 self.param
-rw-rw-r--. 1 mcgaughs mcgaughs    290 Apr  4 13:06 start.logs

mcgaughs@labq03 [/panfs/roc/groups/14/mcgaughs/mcgaughs/reftwo] % head local.geneid.selected.gff local.geneid.selected.dna
==> local.geneid.selected.gff <==
# Sequence KOG0003.2 - Length = 471 bps
KOG0003.2    geneid_v1.4    First    1    376    159.74    +    0    KOG0003.2_1
KOG0003.2    geneid_v1.4    Exon    1    376    159.74    +    .    KOG0003.2_1
KOG0003.2    geneid_v1.4    Terminal    414    439    -10.69    +    2    KOG0003.2_1
KOG0003.2    geneid_v1.4    Exon    414    439    -10.69    +    .    KOG0003.2_1
# Sequence KOG0018.2 - Length = 3702 bps
KOG0018.2    geneid_v1.4    First    1    1113    751.23    +    0    KOG0018.2_1
KOG0018.2    geneid_v1.4    Exon    1    1113    751.23    +    .    KOG0018.2_1
KOG0018.2    geneid_v1.4    Internal    1138    2355    822.39    +    0    KOG0018.2_1
KOG0018.2    geneid_v1.4    Exon    1138    2355    822.39    +    .    KOG0018.2_1

==> local.geneid.selected.dna <==
>KOG0003.2
ATGCAGATCTTCGTGAAGACCCTGACGGGGAAAACCATCACCCTCGAGGTTGAACCATCT
GATACTATAGAAAATGTCAAAGCCAAGATCCAGGATAAAGAAGGAATTCCTCCTGACCAA
CAGCGTTTAATTTTTGCTGGCAAACAATTGGAGGATGGACGTACTCTGTCTGATTATAAC
ATCCAAAAGGAGTCAACACTTCATCTTGTGCTGAGACTTCGTGGTGGGGCAAAGAAAAGA
AAGAAGAAGTCTTACACCACTCCTAAGAAGAACAAACATAAGAGAAAGAAGGTGAAACTT
GCTGTCCTTAAATACTACAAGGTGGATGAGAATGGCAAAATCAGCCGTTTACGCCGTGAG
TGTCCCTCTGAAGAATGTGGGGCTGGAGTGTTCATGGCTAGCCACTTTGATAGACATTAT
TGTGGCAAGTGCTGCTTGACATACTGCTTCAACAAACCAGAAGACAAGTGA
>KOG0018.2

Any advice would be appreciated! Thanks!!

Suzanne

ADD COMMENT • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by suzanne.mcgaugh ▴ 60

0

Entering edit mode

Hi Suzanne, It would be better if you could post this as a separate question (in a new thread). Since this is already a solved thread, it wont be visible to everyone.

To answer your question: what version CEGMA are you using? getting the most recent version might solve this issue.

ADD REPLY • link updated 2.9 years ago by Ram 45k • written 10.1 years ago by arnstrm ★ 1.9k

0

Entering edit mode

Thanks! 2.4 from LPM - I just switched to 2.5 and it fixed the problem. Thanks very much! This was a big help!

Suzanne

ADD REPLY • link 10.1 years ago by suzanne.mcgaugh ▴ 60

Ram · Accepted Answer · 2014-05-14

Hi again,

I think I (finally) might have a fix. The problem might result from people with newer versions of Perl in which an old Perl function is deprecated (in certain situations).

To test whether this is the case, you'll need to change one line in each of the geneid.pm and HMMstar.pm files (inside the lib folder of the CEGMA installation). Maybe make copies of the originals as well. In geneid.pm change line 108 and in HMMstar.pm change line 41 as follows:

BEFORE:      $code .= "$tab foreach my \$c$i qw(A C G T) {\n";
AFTER:       $code .= "$tab foreach my \$c$i (qw(A C G T)) {\n";

And let me know if that works!

Here is some more info about how this function changed in Perl 5.14: http://blogs.perl.org/users/rurban/2010/09/qw-in-list-context-deprecated.html