How to use output from GeneMark-ES to identify function?
2
0
Entering edit mode
7.8 years ago
nut_B ▴ 10

Hello everyone,

I would like to ask anyone who know about How to use result from GeneMark-ES program to identify function? For now I already have a result like below of this post. And It's include Nucleotide sequences output but I would like to use Amino acid sequences to identify function. That's the problem that I would like to ask suggestion by anyone who know to solve this problem. I cannot translate those Nucleotide sequence to Amino acid sequence because It's still include intron when just translate Nucleotide sequence to Amino acid sequence.

Output from GeneMark-ES : *Eukariotyc GeneMark.hmm version 3.49 Sequence name: /storage/home/nuthatai.sut/CallORF_Tool/gm_et_linux_64/gmes_petap/output/data/dna.fa_15 FASTA defline: >dna.fa_15 15_dna 1 30316 Sequence length: 30316 bp G+C content: 27.92% Matrices file: /storage/home/nuthatai.sut/CallORF_Tool/gm_et_linux_64/gmes_petap/output/gmhmm.mod Tue Oct 25 17:52:02 2016 Predicted genes/exons Gene Exon Strand Exon Exon Range Exon Start/End # # Type Length Frame 1 2 - Terminal 6981 7014 34 3 3 - - 1 1 - Initial 7171 7244 74 2 1 - - 2 2 - Terminal 16420 16424 5 3 2 - - 2 1 - Initial 16476 16509 34 1 1 - - 3 1 + Initial 26431 26436 6 1 3 - - 3 2 + Terminal 26642 26644 3 1 3 - -

nucleotide sequence of predicted genes

gene_1|GeneMark.hmm|108_nt ATGTCATCCCTTACTTTGCATCAACAGGCCTACTACACGATAGCACCCGCCGGAATGTCC ATTTGGACTGAACGTAAGAAAGGCGACGTCATGACCAAGACAGTATAA gene_2|GeneMark.hmm|39_nt ATGTTTCTACCAAACATCGGATTTAACTCACCAGGATGA gene_3|GeneMark.hmm|9_nt ATGAATTGA

end nucleotide sequence*

Thank you for advance everyone

GeneMark-es annotation • 5.4k views
ADD COMMENT
0
Entering edit mode

blastx it on NCBI's site...? https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM=blastx&PAGE_TYPE=BlastSearch&LINK_LOC=blasthome

And be aware, GeneMark only predicts CDS sequences. There's no guarantee that actually is one - and if I'm reading your copied sequences correctly, GeneMark thinks it found 3 that are 108nt or less? I'd say its pretty unlikely that anything that short codes for something meaningful.

ADD REPLY
0
Entering edit mode

I have another CDs predictor that another tool have Amino acid sequence. That's reason why I would like to convert GeneMark-ES output to Amino acid sequences. And I would like to thank you for your help, if you have any suggestion please let's me know.

Thank you

ADD REPLY
0
Entering edit mode

If you just need a tool that does translation for you, use ExPASy: http://web.expasy.org/translate/

Again, note that this is not an exact science for unknown organisms, you'll be relying on a generic version of whatever translation table you choose which may not be exactly right for your organism - whatever it is.

ADD REPLY
0
Entering edit mode

Thank you for your suggestion.

ADD REPLY
2
Entering edit mode
7.8 years ago
h.mon 35k

Did you try the get_sequence_from_GTF.pl script, which comes with GeneMark-ES?

Input: gene coordinates in GTF format and sequence in FASTA format

Output: nucleotide and protein sequences of genes

ADD COMMENT
0
Entering edit mode

I will try to use this perl script. Thank you for your suggestion.

ADD REPLY
3
Entering edit mode
3.0 years ago
Ian 6.1k

This is an old question, but I had the same problem. So here is how I solved it:

Edit the GTF file from genemark:

cut -d" " -f1,4- genemark.gtf | sed -e 's/ /\t/' | cut -f 1,3- > genemark_corrected.gtf

Obtain the protein sequence from the assembled contigs. Mine were from MEGAHIT:

get_sequence_from_GTF.pl genemark_corrected.gtf assembled_contigs.fa

The Perl script is found in the same folder as the other genemark scripts, e.g. gmes_linux_64/, where gmes_petap.pl is found.

ADD COMMENT
1
Entering edit mode

Bless your soul, I saw that the genemark.gtf file didn't work with get_sequence_from_GTF.pl and you have saved me an untold amount of time with this solution.

ADD REPLY
0
Entering edit mode

Hello Ian, I hope this message finds you well. I have edited the genemark.gtf file as you suggested. However, the resulting .gtf file appears to be flawed, as it continuously repeats the name of my organism throughout the file. As a result, I was unable to successfully utilize the Perl script. Could you kindly advise on how you might assist me in resolving this issue?

error

ADD REPLY
0
Entering edit mode

I am sorry, but I have not worked on this since the last post. Check you GTF and whether the delimiters are the same.

ADD REPLY

Login before adding your answer.

Traffic: 2406 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6