What tools can be used to assign EC numbers to a protein data set, apart from BLAST2GO? I have complete proteome of an organism. I assigned EC numbers with BLAST2GO but I am also looking for other solutions.
What tools can be used to assign EC numbers to a protein data set, apart from BLAST2GO? I have complete proteome of an organism. I assigned EC numbers with BLAST2GO but I am also looking for other solutions.
My strategy for doing this is to find an existing database that associates reactions/e.c. numbers to sequences, and BLASTP against the protein sequences from those databases and transfer the annotations from the hit to the query. Keep in mind that there are many reactions that don't have an EC number, or that have an ambiguous EC number (Alcohol dehydrogenase, 1.1.1.1 is a good example of an ambiguous EC number), so if your goal is to associate catalyzed reactions with sequences, it may be worth your while to go a bit beyond using just EC numbers.
Here are some ideas of databases to work from (to actually get the peptide sequences and EC-number/reaction associations requires a bit of coding):
Expasy Enzyme: Associates EC numbers with Uniprot IDs. In their downloads section is a text file called "enzyme.dat" which contains these associations, as well as some other useful data. The number of proteins referenced by Enzyme is considerably smaller than the total number of proteins in UniProt (or SwissProt). So to speed searches, and ensure that your top hits are annotated, you may want to BLAST against subset of UniProt containing only those proteins referenced by Enzyme. PRIAM is a command line tool that is fairly easy to use, and automates the process of annotating a proteome based on Expasy Enzyme (the only hiccup I ran into when getting it to work is that it requires an out of date version of BLAST to be installed).
Rhea: An extensive database of metabolic reactions. Many of the reactions (I think including some that don't have an EC number) are associated with UniProt accessions. They have lots of different options for downloading the data. The most convenient for your purposes may be to get "rhea2ec.tsv" and "rhea2uniprot.tsv" from the TSV section of their downloads page. Like with Enzyme, it would probably be best to search against a subset of UniProt.
Metacyc: Metacyc is similar to Rhea in a lot of ways. It has lots of reactions and lots of annotations. It's a great database to use, but there are a couple caveats: 1. you need to register to download the files, 2. The data are organized hierarchically and split into multiple different files, so it takes a while to understand the way they organize the data and writing a program to extract sequence-reaction associations is not trivial.
I've used all of these databases and find that the EC numbers they assign are usually the same as those that Blast2GO assigns. So, depending on the goals of your project, the effort required to get significantly better annotations than those provided by Blast2GO may not be worth it.
I just posted a tutorial on my blog showing how to do this using BLAST, ExPASy Enzyme, and SwissProt. I hope it's helpful to someone.
Just a word of caution. EC numbers are supposed to be assigned to enzymes that have been characterised in vitro, i.e. the mechanistic classification is supported by experimental data. Obviously this is a continuously-shrinking proportion of transitive automated annotation of the type the questioner wants to do. The concern is that many bioinformaticions can overlook what should be a clearly understood (and evidence-tagged) boundary between data-supported function and homology-based extrapolation. One manifestation is that GO assigns catalytic function to many "dead" enzymes (e.g. anywhere between 10% to 15% of all mammallian kinases and proteases)
I totally agree with you here. Any EC assignments made based solely on homology would have to be understood to be "putative" assignments until there was some kind of experimental data to back them up. In addition to the concern you mention about assigning annotations to dead enzymes, I've also seen examples where changes to one or just a few amino acids can change the substrate or product of a reaction. So you can have two enzymes that are 99% identical at the amino acid level, but catalyze different reactions.
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Try Seq2EC: http://www.ebi.ac.uk/thornton-srv/software/rblec/