Single Long Gene Predicted As Several Small Genes By Software
2
0
Entering edit mode
11.6 years ago
Raghul ▴ 200

Hi

I use augustus gene prediction software since my organism is a unicellular eukaryote.From my nucleotide dataset I am more concentrating on membrane proteins.

I trained augustus with input of "genome file" and Protein file for "training". The protein file contained some incomplete transmembrane protein sequences. After training, I used this information to predict genes. I found a single gene being predicted as several genes. Usually this was common for Transmembrane proteins.

I dont have very nearby relative whose transmembrane protein sequences are available with annotation. Any suggestions to overcome this problem?

thank you

raghul

• 2.6k views
ADD COMMENT
1
Entering edit mode
11.6 years ago

We all have encountered similar issues when using Glimmer, GenMark, Galaxy and so on ! My try- Maker [http://derringer.genetics.utah.edu/cgi-bin/MWAS/maker.cgi] as well. Honestly, unless you specify the "minimum ORF length at some 100s aa" etc as cut-off, you have little chances to ameliorate such problems ! That is the prime reasons why all the gene prediction softwares produce "numbers that are NOT comparable" ! Also, this is far from practicable to have a very short TM-protein so as to be comparable to another predicted with lengths of almost 4 times longer ? ! If those TM families could be parsed to "length basis depending on their known annotations from previously reported orthologs- it would ease the job", won't it ? Thanks.

ADD COMMENT
0
Entering edit mode

Thanks for the answer!

Is it acceptable to include protein sequences from 3 related species for "training" in gene prediction? Will this cause errors or more information cause better gene prediction?

ADD REPLY
0
Entering edit mode
11.6 years ago

Yes, depending on the "relatedness" of the taxa ! Depends on how you "infer" those relatedness- say all the 3 species are strictly belonging to a single Family or Sub-family ! But, what if they are "allopolyploid" in origin ? What if the "relatedness" on taxa as previously inferred are merely "taxonomic/ phenetic system based" and not true-relatives ? Lower group's gene structures and higher groups gene structures in a kingdom would significantly differ. Well, agreed with your approach (and thats what MOST of us do too !) if you compare Chimps, Human and Gorilla as training sets for "Bonobos or "Orangutans" ! Case specific and worth trying until a "numerical consensus is made" !

ADD COMMENT

Login before adding your answer.

Traffic: 1993 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6