Question

How do I get FASTA if i have a protein ID (in 10000's) ?

2

Entering edit mode

5.4 years ago

sunnykevin97 ▴ 1000

HI

I have more than 10,000 protein IDS, I'm interested in extracting all the fasta sequences of these proteins ids from uniprot.

What I did so, far-- Already I downloaded all the fasta sequences of the organism I'm interested in.

How can I do, need suggestions.

uniprot sequence gene • 4.5k views

ADD COMMENT • link updated 3.6 years ago by devhimd ▴ 10 • written 5.4 years ago by sunnykevin97 ▴ 1000

0

Entering edit mode

Use blast+ preformatted nr database along with blastdbcmd utility. Use -entry_batch option to do a large number of accessions.

-entry_batch <File_In>
   Input file for batch processing (Format: one entry per line, seq id 
   followed by optional space-delimited specifier(s)

An example for a single accession below.

$ blastdbcmd -db /path_to/blastv5/nr_v5 -entry Q9I7U4 -outfmt %f

ADD REPLY • link 5.4 years ago by GenoMax 151k

0

Entering edit mode

Moving this to a comment since nr may not contain all UniProt ID's and if that is all you have then this would not be sufficient.

ADD REPLY • link 5.4 years ago by GenoMax 151k

score 3 · Answer 1 · 2020-01-29

3

Entering edit mode

5.4 years ago

JC 13k

You can fetch them directly from Uniprot, if you know the uniprot ID the fasta sequence can be retrieved from the URL https://www.uniprot.org/uniprot/{UNIPROT_ID}.fasta

So, if you have a file with the IDs (one per line):

for ID in $(cat file_with_ids.txt); do wget https://www.uniprot.org/uniprot/$ID.fasta; done

ADD COMMENT • link 5.4 years ago by JC 13k

0

Entering edit mode

i am getting an syntax error after (cat fiile_with_ids.txt); i am getting an syntax error at (;)

ADD REPLY • link 4.3 years ago by devhimd ▴ 10

0

Entering edit mode

i want all the fasta files in one output file

ADD REPLY • link 3.6 years ago by devhimd ▴ 10

score 1 · Answer 2 · 2020-02-17

1

Entering edit mode

5.3 years ago

Elisabeth Gasteiger ★ 2.4k

You can upload your list of identifiers to the UniProt batch retrieval tool at https://www.uniprot.org/uploadlists Please don't hesitate to contact the UniProt helpdesk if you have any additional questions.

ADD COMMENT • link 5.3 years ago by Elisabeth Gasteiger ★ 2.4k

score 0 · Answer 3 · 2021-09-23

import urllib.request 
my_uniprot_IDs=set(open('my_uniprot_IDs').read().split())
template='https://www.uniprot.org/uniprot/%s.fasta'

outfile_fasta    = open('my_uniprot_IDs.fasta', 'w')
outfile_obsolete = open('obsolete', 'w')
outfile_success  = open('success', 'w')

for anID in my_uniprot_IDs:
                                     try:
                                        this_url=template%anID
                                        fasta_text=urllib.request.urlopen(this_url).read().decode('utf-8')
                                        outfile_fasta.write(fasta_text)
                                        outfile_success.write(anID+'\n')
                                  except :
                                        urllib.error.HTTPError:
                                        outfile_obsolete.write(anID+'\n')

   outfile_fasta.close()
   outfile_obsolete.close()
   outfile_success.close()