Question

Running EvidentialGene with both genome-guided and de novo assemblies

0

Entering edit mode

8.1 years ago

nash.claire ▴ 510

Hi,

I was hoping to get some help on how to run EvidentialGene with multiple transcript assemblies generated from both genome-guided and de novo assembly tools.

I have read posts on here and papers that refer to EvidentialGene but these are generally referring to assmeblies with no known genome. I have a well annotated genome and I simply want to use EvidentialGene to give me a common set of transcripts identified with multiple algorithms that I can then map back to my genome. I find the documentation for this tool really hard to follow so I am looking for a guide on how to use the tool for my purpose. I will have multiple fasta files generated by the different assembly tools to begin with and I'm looking to see where I go from there.

I realize this is a software specific question but I don't know where else to go to find someone who may have used this software before.

RNA-Seq Assembly • 2.8k views

ADD COMMENT • link updated 8.1 years ago by gilbert.bionet ▴ 160 • written 8.1 years ago by nash.claire ▴ 510

score 0 · Answer 1 · 2016-10-05

Nash,

EvidentialGene works on transcript sequences, and will take as input those from any method, whether de-novo assembly or chromosome-modelled or assembled. Just mix together your multiple transcript assemblies, ensuring each sequence has a unique ID, into one input.fasta sequence file. Evigene works to filter the best gene assemblies from any large set of coding-gene transcripts.

The locus calls Evigene makes are based on coding sequence alignments. These are in general agreement with locus calls based on mapping to chromosomes and measuring coding exon overlaps on that. There are map software effects with respect to (a) imperfect chromosome assemblies, and (b) high identity paralogs, so one gets some ambiguous loci or alternate transcript calls. I recommend NCBI Splign, based on BLASTn, as a good transcript => chromosome mapper that has fewer mistakes than GMAP or BLAT or some other such mappers.

There is a hybrid Evigene (RNA assembly + chromosome modeled) in progress, but not ready yet for general use.

As for EvidentialGene not being easy enough or well documented enough, yes I'm aware of that, and hope that funding agencies will see value in supporting what is, by objective measures, a gene reconstruction tool that is more accurate than the others available. The basic reason for this, as with your case, is that using several gene reconstruction methods gets more accurate genes, as each gene has individual characters that one or another method is superior for. What Evigene adds is a biological ruler for measuring which of the many alternate models are best: the protein orthology ruler. If you want a recent overview of that accuracy for animal and plant gene reconstruction, including comparison to the popular Trinity and MAKER methods, please see this slide talk set: http://arthropods.eugenes.org/EvidentialGene/about/evigene_bothgalmod1606iu.pdf

Don Gilbert