Hi everyone !
I'm trying to do an assembly of a D. suzukii genome, a close related species of D. melanogaster, but with a slighyt bigger genome ( D. mel : 150 M, D. suzukii : about 220M). After the assembly, I want to use maker. But it's my first time using it.
I spend a lot of time reading the manual, which is really detailed : http://weatherby.genetics.utah.edu/MAKER/wiki/index.php/MAKER_Tutorial#Genome_Options_.28Required.29
I read that Maker is running some ab inito prediction (like augustus), and then also use some external evidences (EST evidence from the species or a related species, and the same for protein evidence). After that, it try to make a kind of consensus to annotate the genome (if I'm right).
About the input for EST evidence and protein evidence, I was wondering, the manual say that you could give a Fasta file, or (and ? ) a gff file. But I don't know if both are necessary (I don't think so actually, because the little example for the tutorial only used a fasta file). Do we need to add a gff or just a fasta is good ? (Or maybe just a gff is good even if no fasta ?)
Also, as D. suzukii is not really a emerging model organism, as it's closely related to D. melanogaster, do I need to do the train step of ab initio gene predictors ?
Thanks for your answers !
Cheers,
Roxane
Dear Roxane I am facing this same problem that you had with MAKER, I already have read all the post and despite some of my doubts have been fixed I Ihave a couple of questions and I was wondering if you can help me please!
Ivan, Institute of Ecology, UNAM, Mexico
Hello imda !
It's been a while I didnt used maker, but I can perhaps help you with that ! What are your questions ?
Cheers,
Roxane
Sorry for this very delay response, I was triying to fix some bugs in my assembly. My questions are two?
If you do not have EST for your species, what did you do? and the second one is about the running time for maker, I have been seen that is very very versy slow! how can I speed up the annotation? did you split your genome into small chunks?
Thank you very much
Hello Imda !
1) If you don't have any RNAseq data from the species you want to annotate, you can still use proteins evidences from several closely related species. But I won't advice to do so, I think the best way to annotate a genome is using EST from the same species if you want it to be accurate. Maybe it depends on what you need to do tho. Maker will still works without EST and try to make the best predictions using what he have (tools that predict gene structure such as SNAP etc and proteins from a closely related species)
2)And yes, maker can take a very long time, for me it was about 3-4 days an iteration (and you need at least 2 or 3 for the full maker pipeline to train SNAP etc...). I was thinking at some point to launch maker like contig by contig, but that would need to slip the evidences as well... I'm not sure how this would impact the whole annotation process. Perhaps anyone else has tried such a method ? Maybe maker now take an option in order to let the process be multithreaded on a cluster or something ? I really don'y know sadly :/
Cheers,
Roxane
Dear Roxane, I think that I already fixed the problem with maker about split the genome in many fasta files. I used the tool from maker called fasta_tool. This script split the genome into many chunks and then you can run maker in each chunk and all should run very well. You can apply this method if you do not have a MPI.
Cheers
Very nice to know ! So how did it went in the end ? Was the annotation good with the spliting process ?
Yes, all resulted in a good an annotation and it took like 7 days to finish a genome of 1.6 Gb.