Hi all,
I am trying to perform self-training with GeneMarkS to improve protein calling from virus genomes and transcripts. Could someone tell me if it is correct what I am doing? First, I download eukaryotic viruses from NCBI Refseq to create a "matrix" using gmsn.pl:
/fs/project/PAS1117/modules/GeneMarkS/3.36/gmsn.pl -euk --name virusgroup1 --gm /virusgroup1_refseq_genomes.fasta
which generated (among many others) the following model files:
virusgroup1_gm_heuristic.mat virusgroup1_gm.mat virusgroup1_hmm_combined.mod virusgroup1_hmm_heuristic.mod virusgroup1_hmm.mod
then I used the one named "virusgroup1_gm.mat" to run genemark against a single virus genome (that belongs theoretically to group 1, so GeneMark should call correctly all its viral genes):
/fs/project/PAS1117/modules/GeneMarkS/3.36/gm -m group1_gm.mat -l o q -o p -r p -v NC_023420-2.fasta
nevertheless, I only get a file named "NC_023420-2.fasta.lst" with a few gene coordinates, BUT NO PROTEIN FILE (even having set the options for that):
List of Open reading frames predicted as CDSs, shown with alternate starts (regions from start to stop codon w/ coding function >0.50)
Left Right DNA Coding Avg Start end end Strand Frame Prob Prob
42 4046 direct fr 3 0.60 ....
195 4046 direct fr 3 0.60 0.79
297 4046 direct fr 3 0.60 0.17
333 4046 direct fr 3 0.60 0.10
537 4046 direct fr 3 0.61 0.06
570 4046 direct fr 3 0.60 0.12
List of Regions of interest (regions from stop to stop codon w/ a signal in between)
LEnd REnd Strand Frame
21 4046 direct fr 3
Can you guess what is wrong?
Thanks in advanced, Guillermo