I have just completed a blastx run on my samples and have obtained the following result (example):
$head blastx_result.txt
NS500162:172:HG5CJBGXX:1:11101:25222 Y052L_FRG3G 52.500 40 19 0 2 121 25 64 8.26e-07 44.3
The second column has UniProtKB AC/ID that I need to change to it's respective KO number. I am aware of the Retrieve/ID mapping tool where I can manually select: From UniProtKB AC/ID To: KO and get the associated KO with this ID. This option also allows you to upload a text file with many AC/IDs, but they have a limit of 100,000 IDs that you can put in. I have 16 total files with several million AC/IDs in each file that I need converted to KOs. Splitting these 16 files to 100k small files gives me over 2,000 files to manually put into this tool. This is overwhelming and not practical.
Uniprot also has the following website: How can I access resources on this web site programmatically? where they have sample scripts to use to access this site programmatically. I am not a coder but chose the Perl script they provided in an attempt to do the ID transfer (under Mapping database identifiers of that site). Here is the bit of code I am trying to work with:
$cat uniprot.py
import urllib,urllib2
url = 'http://www.uniprot.org/uploadlists/'
params = {
'from':'ACC+ID',
'to':'KO_ID',
'format':'tab',
'query':'052L_FRG3G 14332_ORYSJ 1A111_ARATH 1A13_SOLLC 1A16_ARATH'
}
data = urllib.urlencode(params)
request = urllib2.Request(url, data)
contact = "myemail@gmail.com" # Please set your email address here to help us debug in case of problems.
request.add_header('User-Agent', 'Python %s' % contact)
response = urllib2.urlopen(request)
page = response.read(200000)
Under 'query' I used test AC/IDs that I know give back KO numbers, however; running this script on my terminal:
perl ./uniprot.py
produced zero results.
My inquiry is this:
1) What am I doing wrong with this code?
2) How can I put in a .txt with millions of AC/IDs (one for each line) within this code so that it returns the KO numbers for those IDs?
A million thanks!
UniProt provides ID mappings in a single text file (you can download it from here: ftp://ftp.uniprot.org/pub/databases/uniprot/current_release/knowledgebase/idmapping/).