why only 2 amino acids are used in gene prediction
1
0
Entering edit mode
7.5 years ago
ssshan • 0

Dear all,

I'm a master candicate who is interested in machine learning with gene prediction. I noticed that most papers would pick dimers (2 amino acids) as a key feature to train positive and negative data sets during gene prediction. However, I don't know why dimers is the only or best option. Anyone could help?

Thanks in advance!

gene alignment sequence • 1.3k views
ADD COMMENT
4
Entering edit mode
7.5 years ago

Hexamers (6 nt long words) are accepted as the most accurate k-mer frequency based measure of coding potential. In 1992, a systematic study of more than twenty compositional properties indicated that hexamer composition gave the best discrimination between coding and non-coding regions (Fickett & Tung, Nucleic Acids Research, 1992). Since that time, reading frame-dependent hexamer frequencies has been the most commonly used content sensor of current gene prediction programs.

ADD COMMENT

Login before adding your answer.

Traffic: 2747 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6