Question

Braker output into maker (Annotation)

3

Entering edit mode

8.5 years ago

mafireyi ▴ 80

Good day. I am new to genome annotation. I am running maker on a new genome and am planning on using Braker to train augustus and gene mark. I have already used my tophat alignment of my mRNA-seq data to the genome to run braker. My question is how then do I use the braker output in maker? Is there a maker script that converts the gtf files produced by braker into a format accepted by maker in the CTL files? Thank you. Will appreciate any help

Assembly • 10k views

ADD COMMENT • link 8.5 years ago by mafireyi ▴ 80

0

Entering edit mode

8.5 years ago

mafireyi ▴ 80

Thank you very much, this has been very helpful

ADD COMMENT • link 8.5 years ago by mafireyi ▴ 80

score 11 · Accepted Answer · 2016-05-30

11

Entering edit mode

8.5 years ago

Philipp Bayer 8.7k

BRAKER1 uses GeneMark-ET to train and evaluate AUGUSTUS, and the final gene predictions are from AUGUSTUS.

You can do two things: First, you can use the model that BRAKER creates in the AUGUSTUS config dir in MAKER's AUGUSTUS run. All you have to do is to specify the species name you gave to BRAKER in MAKER's maker_opts.ctl. For example, you ran BRAKER like this:

braker.pl  --genome=genome.fasta --bam=reads.bam --species=my_species

This will create a folder named my_species in the config dir of AUGUSTUS. If you use the same AUGUSTUS in MAKER all you have to do is enter your species in MAKER's maker_opts.ctl, and AUGUSTUS will find it again:

augustus_species=my_species #Augustus gene prediction species model

Alternatively, you can treat the BRAKER annotation as "legacy annotation" in MAKER. In that case, you can use the final .gff from BRAKER1 as pred_gff in MAKER's maker_opts.ctl, and MAKER will edit the exons and introns based on the external evidence you give it. There is an example for this in this protocol.

I have no idea which of the two approaches is "better". If you need more help with running MAKER/BRAKER, the Supplementary Materials from the BRAKER publication has many useful commands: http://bioinformatics.oxfordjournals.org/content/suppl/2015/11/09/btv661.DC1/supplementary.pdf

Later edit: One caveat you will run into is that MAKER performs repeat masking while BRAKER does not, so you're (especially with plant species) bound to see more false-positive transposon-related genes with BRAKER if you don't run repeat masking first.

ADD COMMENT • link 8.5 years ago by Philipp Bayer 8.7k

2

Entering edit mode

If you use the first approach (simply using the species parameters produced by BRAKER in MAKER), you are going to loose the RNA-Seq information in the gene prediction step with AUGUSTUS. MAKER can incorporate RNA-Seq, too, but it does it in a different way that is not optimal for AUGUSTUS (at least as far as I am aware, unless there was a serious update that escaped my attention). I therefore recommend the legacy annotation path. This way, you can also pass the GeneMark-ET predictions from BRAKER to MAKER.

From one of the BRAKER developers.

ADD REPLY • link 7.0 years ago by katharina.hoff ▴ 70

0

Entering edit mode

Hi Katharina, are there any arguments against doing both?

MAKER does apparently use hints to improve on the predictions, and I think I have seen numbers (old ones though) stating that AUGUSTUS within MAKER perform better than AUGUSTUS outside. This was likely not with RNA-seq data involved.

If you use both the augustus_species and pred_gff both predictions will compete against each other in a way, and MAKER will chose the one best fitting the evidence.

ADD REPLY • link 6.9 years ago by o.k.torresen • 0

0

Entering edit mode

Please check with the MAKER developers whether MAKER would be able to select the best prediction based on evidence if you do this. It might be problematic if MAKER cannot weight the evidence that was used by BRAKER, correctly.

ADD REPLY • link 6.8 years ago by katharina.hoff ▴ 70

0

Entering edit mode

Hi, Augustus was not able to use the new species produced by Breaker. do you have any idea why?

Cheers Luigi

ADD REPLY • link 8.2 years ago by luigi.faino ▴ 20

1

Entering edit mode

Maybe you have two different AUGUSTUS_CONFIG_PATHs?

BRAKER ran with AUGUSTUS_CONFIG_PATH1, and MAKER runs AUGUSTUS with AUGUSTUS_CONFIG_PATH2?

Either make those two paths identical before running BRAKER, or copy the parameters to the correct location:

cp -r $AUGUSTUS_CONFIG_PATH1/species/yourspecies $AUGUSTUS_CONFIG_PATH2/species

ADD REPLY • link 6.8 years ago by katharina.hoff ▴ 70

0

Entering edit mode

Just noticed that now too. Have you found a solution for that?

ADD REPLY • link 8.0 years ago by mafireyi ▴ 80

1

Entering edit mode

Two ideas:

1) check whether the environment variable that MAKER's augustus uses is set for the right path where BRAKER wrote to -

echo $AUGUSTUS_CONFIG_PATH

2) check whether you're allowed to write to the AUGUSTUS_CONFIG_PATH

ADD REPLY • link 7.7 years ago by Philipp Bayer 8.7k

0

Entering edit mode

Thanks for the explanation. I read in this paper Four arguments for not masking your genome before annotation. Would you advise in that direction?

ADD REPLY • link 4.2 years ago by eennadi ▴ 40

0

Entering edit mode

This would be its own good question but yeah, right now I lean towards not running RepeatMasker via MAKER, but instead later remove proteins tha tare X% covered by known repeats or contain transposase related domains

ADD REPLY • link 4.2 years ago by Philipp Bayer 8.7k