Nash,
EvidentialGene works on transcript sequences, and will take as input those from any method, whether de-novo assembly or chromosome-modelled or assembled. Just mix together your multiple transcript assemblies, ensuring each sequence has a unique ID, into one input.fasta sequence file. Evigene works to filter the best gene assemblies from any large set of coding-gene transcripts.
The locus calls Evigene makes are based on coding sequence alignments. These are in general agreement with locus calls based on mapping to chromosomes and measuring coding exon overlaps on that. There are map software effects with respect to (a) imperfect chromosome assemblies, and (b) high identity paralogs, so one gets some ambiguous loci or alternate transcript calls. I recommend NCBI Splign, based on BLASTn, as a good transcript => chromosome mapper that has fewer mistakes than GMAP or BLAT or some other such mappers.
There is a hybrid Evigene (RNA assembly + chromosome modeled) in progress, but not ready yet for general use.
As for EvidentialGene not being easy enough or well documented enough, yes I'm aware of that, and hope that funding agencies will see value in supporting what is, by objective measures, a gene reconstruction tool that is more accurate than the others available. The basic reason for this, as with your case, is that using several gene reconstruction methods gets more accurate genes, as each gene has individual characters that one or another method is superior for. What Evigene adds is a biological ruler for measuring which of the many alternate models are best: the protein orthology ruler. If you want a recent overview of that accuracy for animal and plant gene reconstruction, including comparison to the popular Trinity and MAKER methods, please see this slide talk set:
http://arthropods.eugenes.org/EvidentialGene/about/evigene_bothgalmod1606iu.pdf
Hi Don,
Thank you for getting back to me.
I am sold on the algorithm for sure. It looks like it is really useful and does exactly what I'm looking for. The problem I'm having is figuring out how to use the tool on the terminal. What commands do I need to use and in what order? I should probably mention that I'm not a bioinformatician but I do have some experience in running programs on the terminal. Is there some sort of guide or walkthrough or example of a run through starting with multiple fasta files that I could follow? And I'm guessing I would use Splign once I have that magic file of good transcripts that Evigene kicks out. Also, what are the recommended hardware requirements for running Evigene in terms of RAM etc??
Hi Nash,
I'm trying to use evigene and I just run the script and fed it with my transcriptome. It works and gives me the expected outputs, but I'm not an expert (in fact, I'm a newbie) and this smooth run of scripts scare me (looks too easy to be ok: P). Have you use this tool finally?
I don't know if I miss any important step and in fact I obtained a 90% reduction of redudancy so I'm excited but also doubitative.