Hello!
I have a large list of kinase (gene) names extracted from the UniProtKB modified residue section. (e.g. MAPK1, CDK1, SRC, ATM etc.)
I am trying to convert these names to their entry names (ID) to get:
MAPK1: MK01_HUMAN
CDK1: CDK1_HUMAN
SRC: SRC_HUMAN
etc...
The problem is that for ever gene name I get many IDs and I only want the official one (see below) which always seems to be the only one reviewed.
I tried adding 'columns': 'reviewed' and 'organism': 'human' in the params below but it has no effect. I am basically lost!
An example using MAPK1 kinase:
import urllib,urllib2
url = 'https://www.uniprot.org/uploadlists/'
params = {
'from':'GENENAME',
'to':'ID',
'format':'tab',
'query':'MAPK1',
'columns': 'reviewed'
}
data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "xxxx@outlook.com"
request.add_header('User-Agent', 'Python %s' % contact)
response = urllib2.urlopen(request)
header = response.readline()
entries=response.read()
id_list=[]
new_entries=entries.split("\n")
for element in new_entries:
if element=="":
continue
else:
element=element.split("\t")
if "_HUMAN" in element[1]:
id_list.append(element[1])
The final id_list is: ['MK01_HUMAN', 'Q1HBJ4_HUMAN', 'Q499G7_HUMAN']
I am only interested in extracting the 'main' identifier; MK01_HUMAN. Please can anyone help?
Thank you so much - it worked. Now I can sleep peacefully after hours of staring at this :)
@vkkodali I get error like
Python seems to be complaining about indentation. If you have copy/pasted the code from above, you should make sure the indentation is correct. I don't think you need to indent the entire code block after the
import
statement.@vkkodali I just correct the indent but it runs and does not give any output
Without more detailed information from you, I cannot be of much help. What have you tried? How did you run the code? Did you run it as a script? Or did you run it at the python interpreter? Did you use the example posted here or did you use your own example? I am not sure what you mean by 'does not give any output'. What were you expecting? The code, as written in the first post, does not output anything. It populates a list called
id_list
. Did you see anything in that list?