Hello,
I have a list of 3000 pdb IDs, for which I need (1) fasta sequences from PDB and (2) Uniprot sequences. Is there a simpler way of downloading the sequences instead of manually downloading for each ID?
thanks in advance!
Hello,
I have a list of 3000 pdb IDs, for which I need (1) fasta sequences from PDB and (2) Uniprot sequences. Is there a simpler way of downloading the sequences instead of manually downloading for each ID?
thanks in advance!
Lets say you have a txt file with pdb IDs:
2AID 4RLB
You can do something like:
parallel -a pdb_list.txt curl -o {}.fasta http://www.rcsb.org/pdb/files/fasta.txt?structureIdList={}
If you do not have parallel:
while read line; do curl -o ${line}.fasta http://www.rcsb.org/pdb/files/fasta.txt?structureIdList=${line}; done < pdb_list.txt
Try similar approach for Uniprot
Have a look
Uniprot provides a facility to do bulk download through their website.
http://www.uniprot.org/uploadlists/
Through this feature you can get any sequence contained within the uniparc archive, including PDBseqs
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
Thanks for your reply,
Yes I have a text file with 3000 PDB IDs. I want 3000 fasta files of the corresponding IDs downloaded. Do you mean, I need to go to this website (which is not working) http://www.rcsb.org/pdb/files/fasta.txt?structureIdList={} and enter
parallel -a pdb_list.txt curl -o {}.fasta
?No. Please do it from your terminal on Mac or Linux
From Windows platform please?
The uniprot facility which I link to below is web-based and therefore works for any platform. The limit of number of IDs you can give (in a file) is in the 10s of thousands, so you shouldn't have a problem. If you do, contact uniprot help.