After a MAKER run with 3 ab initio predictors and using fasta_merge -d on the resulting log file, I get 4 output files - one for each ab initio annotator, and one called "Genome.maker.proteins.fasta" which looks like the "union" of the three ab initio predictors. However, at least one of the ab initio annotation programs output has many more proteins than the final "Genome.maker.proteins.fasta" output.
I first thought it's just proteins with AED != 1 in the final output but proteins with AED=1 are still abundant. Other filtering flags like min_protein etc. are set to 0, so it doesn't filter these out as well (standard maker_opts.ctl). It looks like it filtered relatively short proteins (<10AA) from my ab initio predictions, but there's no indication about this in my options.
I can't find anything on this in the devel lists or the wiki, is there any other filtering step done by MAKER I'm not seeing right now?
These are some good ideas!
It's
always_complete=0
(and I get about 10% proteins without M, harder to check for transcripts since these contain UTRs)I think this is the best explanation, and that would explain why especially so many smaller GeneMark-ES models "disappeared" - they were just merged into bigger models!