Hi, this is my first time using maker genome annotation pipeline.
I recently finished maker's first round and was surprised from the results I got (was expecting better results).
I used minimap2 to align a de novo transcriptome to the reference genome and let maker do the alignments of known Crustacean protein sequences and mRNA sequences of my specie from NCBI.
Prior to running maker I used BUSCO to evaluate my de novo transcriptome assembly and the genome (using metaeuk):
Transcriptome: C:99.6%[S:7.4%,D:92.2%],F:0.2%,M:0.2%,n:1013
Genome: C:88.5%[S:37.7%,D:50.8%],F:7.8%,M:3.7%,n:1013
I ran BUSCO on all the transcripts maker predicted to evaluate the results:
C:64.6%[S:34.5%,D:30.1%],F:19.2%,M:16.2%,n:1013
Although this is only the first round, what might cause ~160 BUSCOs missing from maker's predictions?
Can anyone please share from his experience, is it common?
Maybe I was over expecting and these are actually good first round results?
Regarding training ab initio annotation tools, would you use BUSCO as Augustus training? I have seen some tutorials which takes training sequences from mRNA annotations created in the first round (with 1000bp on each side), while others recommend filtering them (like in this: gene set filter/selection for training ab initio annotation tools ) and straight Augustus training
Thanks for consideration and help.
As far as I know, accuracy of Maker depends on how well the ab initio predictors (Augustus, SNAP, etc) are trained. In your case, looks like the predictors are doing a poor job. Using BUSCO to train Augustus is a good start if it is not trained for your species (I am assuming this is the case). By the way, BUSCO is a good estimate, but you should also pay attention to other metrics, like total number of predicted genes (does is make sense for your species?), average size of predicted proteins, introns, etc.