Entering edit mode
6.0 years ago
Learner
▴
280
I am interested in retrieving the information for proteins and understand which ones are enzyme and which ones are not. Is there an easy way to do it for a large number of genes ? using uniprot ?
How about retrieving all UniProtKB entries with an EC number? You can use a query like
ec:* AND organism:"Homo sapiens (Human) [9606]"
for that.@vkkodali I don't mind doing that but it does not give anything :-) I am more looking to find something programing rather than try and error
What do you mean by 'it does not give anything'? I see over 15000 hits returned for that query. Once you have figured out the query that you want to use, you can then proceed to use the REST API to programmatically retrieve the info you want. This earlier post from Biostars is relevant: UniProtKB - mapping gene name to ID (*_HUMAN ) using python2
@vkkodali so you mean those 15000 hit are the enzymes proteins ? if so, it would be easy then to match and find those that are not. let me know If that is the right thing :-)
The assumption I am making here is that the protein is an enzyme if there is an EC number assigned to it. Out of the ~15000 hits, less than 5000 are UniProt Reviewed records; which would be the only ones I'd bother to look at. Beyond that, it depends on your use case. If you need to include absolutely all enzymes then this will surely miss a few. Along the same lines, this list may include a few that are actually not enzymes. But that's some QC work you will have to do.
@vkkodali Thank you . how can I extract the EC number too? I want to know how to QC it !
Have you used the UniProt REST API before? One of the options there is to specify which
columns
you want. In your case, you should haveec
in addition to other columns such asid
andentry name
. Read up on their API at the link shown above and post your code here that I can review.@vkkodali This is how I am trying to do it but I get error over url