How do I get FASTA if i have a protein ID (in 10000's) ?
3
2
Entering edit mode
4.8 years ago
sunnykevin97 ▴ 990

HI

I have more than 10,000 protein IDS, I'm interested in extracting all the fasta sequences of these proteins ids from uniprot.

What I did so, far-- Already I downloaded all the fasta sequences of the organism I'm interested in.

How can I do, need suggestions.

uniprot sequence gene • 4.1k views
ADD COMMENT
0
Entering edit mode

Use blast+ preformatted nr database along with blastdbcmd utility. Use -entry_batch option to do a large number of accessions.

-entry_batch <File_In>
   Input file for batch processing (Format: one entry per line, seq id 
   followed by optional space-delimited specifier(s)

An example for a single accession below.

$ blastdbcmd -db /path_to/blastv5/nr_v5 -entry Q9I7U4 -outfmt %f
ADD REPLY
0
Entering edit mode

Moving this to a comment since nr may not contain all UniProt ID's and if that is all you have then this would not be sufficient.

ADD REPLY
3
Entering edit mode
4.8 years ago
JC 13k

You can fetch them directly from Uniprot, if you know the uniprot ID the fasta sequence can be retrieved from the URL https://www.uniprot.org/uniprot/{UNIPROT_ID}.fasta

So, if you have a file with the IDs (one per line):

for ID in $(cat file_with_ids.txt); do wget https://www.uniprot.org/uniprot/$ID.fasta; done

ADD COMMENT
0
Entering edit mode

i am getting an syntax error after (cat fiile_with_ids.txt); i am getting an syntax error at (;)

ADD REPLY
0
Entering edit mode

i want all the fasta files in one output file

ADD REPLY
1
Entering edit mode
4.8 years ago

You can upload your list of identifiers to the UniProt batch retrieval tool at https://www.uniprot.org/uploadlists Please don't hesitate to contact the UniProt helpdesk if you have any additional questions.

ADD COMMENT
0
Entering edit mode
3.2 years ago
devhimd ▴ 10
import urllib.request 
my_uniprot_IDs=set(open('my_uniprot_IDs').read().split())
template='https://www.uniprot.org/uniprot/%s.fasta'

outfile_fasta    = open('my_uniprot_IDs.fasta', 'w')
outfile_obsolete = open('obsolete', 'w')
outfile_success  = open('success', 'w')

for anID in my_uniprot_IDs:
                                     try:
                                        this_url=template%anID
                                        fasta_text=urllib.request.urlopen(this_url).read().decode('utf-8')
                                        outfile_fasta.write(fasta_text)
                                        outfile_success.write(anID+'\n')
                                  except :
                                        urllib.error.HTTPError:
                                        outfile_obsolete.write(anID+'\n')

   outfile_fasta.close()
   outfile_obsolete.close()
   outfile_success.close()  
ADD COMMENT
1
Entering edit mode

Please post actual code not screenshots of code. You will also want to describe what the code does and how to use it.

Use the 101010 button to format code when in edit mode.

ADD REPLY

Login before adding your answer.

Traffic: 2957 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6