Genome de novo annotation with Maker
2
4
Entering edit mode
6.1 years ago
alslonik ▴ 320

Hi all,

I am annotating my new plant genome now and am working with Maker and its very detailed tutorial (http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial_for_WGS_Assembly_and_Annotation_Winter_School_2018) . I have read a few really helpful posts about Maker here as well, but i still have some questions.

  1. SNAP training. How do you actually know that it is enough to train and you can run your final Maker run? I have tried to run it several time and there is a difference in the number of genes every time. It is actually a kind of sinusoidal graph - number of genes are going up and down... So when do you stop? Or how do you know that SNAP is trained? Do you wait until the plateau? How many times did you do the training and why?

  2. My genome has unusually high repeat content. This is why I decided to create its own repeat library with repeatModeler. The question is where on the option file do I add this repeat library?

THANKS a lot for your help,

Alex

Maker SNAP Genome annotation • 3.3k views
ADD COMMENT
2
Entering edit mode
6.0 years ago
jean.elbers ★ 1.7k

You can specify a custom repeat library (in FASTA format) with rmlib in the Repeat Masking section of the make_opts.ctl file

#-----Repeat Masking (leave values blank to skip repeat masking)
model_org=all #select a model organism for RepBase masking in RepeatMasker
rmlib=repeatlibrary.fa #provide an organism specific repeat library in fasta format for RepeatMasker
repeat_protein=/opt/maker/data/te_proteins.fasta #provide a fasta file of transposable element proteins
rm_gff= #pre-identified repeat elements from an external GFF3 file
prok_rm=0 #forces MAKER to repeatmask prokaryotes (no reason to change this), 1 = yes, 0 = no
softmask=1 #use soft-masking rather than hard-masking in BLAST (i.e. seg and dust filtering)`

This is an example Repeat Masking section

You might also consider running ProtExcluder on the output of RepeatModeler

http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Basic

# Run blastx then ProtExcluder to excluce known protein sequences from RepeatModeler library
/usr/bin/blastx -num_threads 75 -db /genetics/elbers/maker/uniprot_sprot.fasta -evalue 1e-6 \
-query repeatlibrary.fa -out repeatlibrary.fa.blast

/opt/ProtExcluder1.1/ProtExcluder.pl -f 50 repeatlibrary.fa.blast repeatlibrary.fa
# output of ProtExcluder is "temp"
# rename temp to whatever you desire
mv temp repeatlibrary.fa2
ADD COMMENT
0
Entering edit mode

Many thanks for your help, Jean,

I am following your advice and excluding the protein sequences. The question is now- which protein db did you use? Only uniprot? Or combined with refseq? Isn't it redundant? Do you exclude the transposon sequences as it is pointed out in the Maker wiki? Do you do it by alignment to the transposon library? It sounds like a really simple step, but somehow I am stuck all the way...

The library that is provided in the manual is old (2011) and also appears to be corrupt...

ADD REPLY
1
Entering edit mode

I would use the most up-to-date Swiss-Prot database

wget ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/complete/uniprot_sprot.fasta.gz

and not worry about combining RefSeq or transposon sequences. Someone with more experience might have better advice to give, but I think this is sufficient.

ADD REPLY
0
Entering edit mode

Got you. Thanks again!

ADD REPLY
0
Entering edit mode

Hii.. I've been looking for ProtExcluder but, i couldn't find it out. could u please share the link to download the same?

ADD REPLY
0
Entering edit mode
5.0 years ago
alslonik ▴ 320

Hi, Here is the description of the repeat library construction and by pressing on the link of ProtExcluder you are getting the tar.gz with the script. There is also a link for manual. Good luck! Look for the link is in section: 4. Exclusion of gene fragments

https://weatherby.genetics.utah.edu/MAKER/wiki/index.php/Repeat_Library_Construction-Advanced

ADD COMMENT

Login before adding your answer.

Traffic: 2637 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6