Entering edit mode
11 weeks ago
colindaven
7.4k
Hi all,
does anyone have any experience in using gene annotation tools on oomycetes? There is very little written on this.
As oomycetes tend to be larger than fungi (ca 100 MB vs ca 50 MB from what I've seen so far), would you rather use a fungal or a general eukaryotic gene finder ?
I am considering
- maker with RNA-seq
- codonquarry
- helixer
- plain Augustus
Thanks!
As far as I know there is no correlation between genome size and gene prediction tool performance, so you're better of looking at other metrics.
If you're willing/able to train a gene predictor go for a good performing tool and optimize that one for your species of interest. If not then all the ones you list are viable options. What could help is to look for one that is able to predict gene structures you expect in your genome. Do you expect genes to have many/few introns, long-short introns, ... those have likely more influence than genome size
Unfortunately I do see a dependency between genome size and repeat content - at least for large crops. This recent paper backs up my views (they looked at the BRAKER2 tool at least) .
https://pmc.ncbi.nlm.nih.gov/articles/PMC11186247/
I will definitely look more into oomycete biology and perform various analyses. I hope someone with more experience than me in oomycetes might get back to me in future though.
Oomycetes have pretty complex genomes (most of it repeats as I recall from working with one in distant past) so it would help a lot if you had RNAseq data to go with this. Preferably from various life stages.
There is definitely a correlation between genome size and repeat content, I consider that a fact even. Still don't see the link to gene prediction performance though.
right, questionable paper I'm afraid ... in several aspects but ok
If a gene predictor is hindered by the size of a genome (and/or repeats being present) it's much more likely to reflect an underlying problem (eg. unsuccessful repeat masking will indeed lead to false predictions and less specificity but that is not due to the gene predictor performance).
If there would a be dependency with genome size, that would then mean that it would be much easier to predict genes in smaller eukaryotic genomes (which certainly is not the case).
Thing like %GC will much more influence the performance of gene predictors than genome size (and that can, in its own, vary independently from the genome size).
I would go for a general gene predictor and optimize it for your genome. (just as the fungal gene predictors are actually general eukaryotic predictors that have been optimized for fungal genomes ... ) There are only 2 sort of gene predictors: eukaryotic and prokaryotic as they fundamentally differ in things like gene structures. All other ones are simply sub-flavors of the aforementioned two.