I wanna use a Gene Ontology term to get related sequences in Uniprot. It is simple to do it manually, however, I wanna use python to achieve it. Anybody has ideas with it? For example, I have GO:0070337, then I wanna download all the search results in a fasta file. Thanks
#!/usr/bin/env python
"""Fetch uniprot entries for given go terms"""
import sys
from Bio import SwissProt
#load go terms
gos = set(sys.argv[1:])
sys.stderr.write("Looking for %s GO term(s): %s\n" % (len(gos)," ".join(gos)))
#parse swisprot dump
k = 0
sys.stderr.write("Parsing...\n")
for i,r in enumerate(SwissProt.parse(sys.stdin)):
sys.stderr.write(" %9i\r"%(i+1,))
#parse cross_references
for ex_db_data in r.cross_references:
#print ex_db_data
extdb,extid = ex_db_data[:2]
if extdb=="GO" and extid in gos:
k += 1
sys.stdout.write( ">%s %s\n%s\n" % (r.accessions[0], extid, r.sequence) )
sys.stderr.write("Reported %s entries\n" % k)
For me it's less than 6 minutes to parse the latest swissprot dump (it depends on your internet connection).
Of course, if you will run it multiple times, better download the dump and run it from local copy.
That's the same i linked.