How To Compare Gene Models
7
6
Entering edit mode
14.2 years ago
Gvj ▴ 470

Is there any script/program available to compare two gene models (gff and fasta format) generated by gene prediction tools? So what I want as an out put is some comparative statics saying common genes , uniq genes of each set and overlapping genes from two or more gene models.

Thank you in advance

gene model comparison • 12k views
ADD COMMENT
0
Entering edit mode

Welcome to the (almost) same boat... Do you have proper gene models or a rather messy mixture of gff-s from i.e. exonerate, GMAP, PASA, AUGUSTUS?

ADD REPLY
0
Entering edit mode

Thanks. I have models from Augustus and exonerate

ADD REPLY
3
Entering edit mode
14.2 years ago
Darked89 4.7k

There is a small utility for reporting feature overlaps between gff files.

It requires a bit of editing (sed-ing rather) of typical gff files:

gff file: it should contain at least 8 fields separated by tabs, followed by a list of attributes of the form [key space value] separated by spaces. There should not be any space at the end

sed 's/[;=]/ /g' filename.gff

does the job. I just started testing it (works on my test files) but did not figured out complete usage.

ADD COMMENT
0
Entering edit mode

it works well. I didn't understand very well about the mode flag (except boolean overlap) . Could you please give some eg? I am using some sed commands in order to make some conclusion like how many genes structures are identical between two sets. Is that also possible by any flag.

ADD REPLY
0
Entering edit mode

it works well, thank you. I didn't understand very well about the mode flag (except boolean overlap) . Could you please give some eg? I am using some sed commands in order to make some conclusion like how many genes structures are identical between two sets. Is that also possible by any flag.

ADD REPLY
0
Entering edit mode

Like 'st' flag, it would be nice to have a flag to ignore score.

ADD REPLY
0
Entering edit mode

My understanding is that it does not have a clue about what is being compared. This means that if one gff file contains multiple types (colum3) and other is having i.e. just blast_match, the overlaps will still be reported. If one has 2 files with at least partially matching types, I guess only after grepping and creating temporary files one can get overlaps between desired types.

ADD REPLY
2
Entering edit mode
14.1 years ago
Darked89 4.7k

Yet another gene prediction comparison software, Eval from Brent lab:

http://mblab.wustl.edu/software/eval/

Works OK on provided .gtf example files, giving a very detailed output. Gets confused by gff3 files from Augustus and SNAP, does not seem to understand exonerate output reformatted to gff3. Takes a long time to load Augustus 1.4M lines gff file.

ADD COMMENT
2
Entering edit mode
10.5 years ago
djinnome ▴ 50

I believe the answer you are looking for is ParsEval, which is now part of AEGeAn software toolkit. This is by some of the same folks who made GAEVAL.

ADD COMMENT
1
Entering edit mode
14.2 years ago
Darked89 4.7k

There is some rather old Java program for comparing prediction accuracy: http://bioinformatics.oxfordjournals.org/content/19/13/1712.abstract

ftp://iubio.bio.indiana.edu/molbio/genefind/

I had no time to test it yet.

ADD COMMENT
0
Entering edit mode

Thanks for link. Even though its not useful in my case since I don't have a reliable annotation to use as ref.

ADD REPLY
0
Entering edit mode

GFPE runs OK with examples, but it will require at least some GFF file changes to compare AUGUSTUS to exonerate. Re reliable annotation: not exactly. What we need is some kind of estimation how set A compares to set B. Getting numbers like "number of correct / missing / wrong exons" plus "Correlation Coefficient" gives clues about how many exons predicted by A are also predicted by B etc.

ADD REPLY
0
Entering edit mode

It doesn't work with my sample set. I guess it only run on gff1 version??

ADD REPLY
1
Entering edit mode
14.2 years ago

The developers of PlantGDB developed a GAEVAL tool (Gene Annotation Evaluation), but this basically scores gene models based on aligned evidence.

I am currently running a genome annotation pipeline and I would like to do the same thing--compare the predictions from the annotation pipeline with the annotations provided by the group that did the sequencing and assembly. I wasn't really interested in Jigsaw, since it takes the two (possibly) disparate gene models and builds a consensus gene model. I'm more interested in what you mentioned--summary statistics about the differences between the two sets of annotations.

I am currently working on a script to do this comparison. I'm still trying to figure out what comparisons/statistics are interesting (borders of gene model, exon agreement, etc). Will share when I have something reliable.

ADD COMMENT
0
Entering edit mode

Looking forward to your script. By the way have you used/established xGDB for centralised manual annotation?

ADD REPLY
0
Entering edit mode

I know that xGDB can utilize a community annotation tool called yrGATE, but I have not set up one of those myself yet.

ADD REPLY
0
Entering edit mode
14.2 years ago

There is JIGSAW, which reads GFF. It does somewhat more than you're asking and combines the input into new gene models.

ADD COMMENT
0
Entering edit mode

Yups JIGSAW and other combiners are for gene prediction by combining results of different gene prediction tools. As I mentioned I need some comparative figs not a new gene model.

ADD REPLY
0
Entering edit mode

The reason I mentioned it is that it does the hard work of comparing the gene models. That's not easy, except in the simple case when two models are identical. It does produce some comparative stats on the input gene finders too, but they are totals of genes and exons found/missed etc.

ADD REPLY
0
Entering edit mode
8.0 years ago

I didn't understand very well about the mode flag . Could you please give some examples? I am using some sed commands in order to make some conclusion like how many genes structures are identical between two sets. Is that also possible by any flag.

ADD COMMENT

Login before adding your answer.

Traffic: 2500 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6