Question

How does MAKER decide which proteins go into the final output?

1

Entering edit mode

10.2 years ago

Philipp Bayer 8.8k

After a MAKER run with 3 ab initio predictors and using fasta_merge -d on the resulting log file, I get 4 output files - one for each ab initio annotator, and one called "Genome.maker.proteins.fasta" which looks like the "union" of the three ab initio predictors. However, at least one of the ab initio annotation programs output has many more proteins than the final "Genome.maker.proteins.fasta" output.

I first thought it's just proteins with AED != 1 in the final output but proteins with AED=1 are still abundant. Other filtering flags like min_protein etc. are set to 0, so it doesn't filter these out as well (standard maker_opts.ctl). It looks like it filtered relatively short proteins (<10AA) from my ab initio predictions, but there's no indication about this in my options.

I can't find anything on this in the devel lists or the wiki, is there any other filtering step done by MAKER I'm not seeing right now?

maker annotation • 3.7k views

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Philipp Bayer 8.8k

Ram · Accepted Answer · 2015-06-29

2

Entering edit mode

10.2 years ago

Lesley Sitter ▴ 610

Have you viewed those ab initio predictors as a separate track in something like IGV and compared them to the final gene model track? If you are only looking at number maybe your not getting the entire picture. The reason you might have fewer final proteins than you have ab initio predictions is because maker tries to create a consensus gene model based on all the evidence so multiple smaller evidence models can still result in one final gene model.

Did you use the option always_complete=1? Maybe if the ab initio model does not contain a start / stop codon it might be discarded in the final product.

It could also be that MAKER had conflicting evidence for some models, for example if three tracks that predict formation A and one track that wants another formation... MAKER will than pick the best gene model and that will be the one that has the evidence.

ADD COMMENT • link updated 2.7 years ago by Ram 45k • written 10.2 years ago by Lesley Sitter ▴ 610

0

Entering edit mode

These are some good ideas!

Did you use the option always_complete=1? Maybe if the ab initio model does not contain a start / stop codon it might be discarded in the final product.

It's always_complete=0 (and I get about 10% proteins without M, harder to check for transcripts since these contain UTRs)

The reason you might have fewer final proteins than you have ab initio predictions is because maker tries to create a consensus gene model based on all the evidence so multiple smaller evidence models can still result in one final gene model.

I think this is the best explanation, and that would explain why especially so many smaller GeneMark-ES models "disappeared" - they were just merged into bigger models!

ADD REPLY • link updated 2.7 years ago by Ram 45k • written 10.1 years ago by Philipp Bayer 8.8k