Hi, I have a text file with 1000 different proteins (EDIT: proteins, not PDB-IDs), and I would like to gather all the structural information related to them from RCSB. I have never done any high-throughput work with RCSB, and the list is a bit too long to manually curate.
Does anyone know any existing methods/packages that could help me out? Ideally I'd be able to download all potentially relevant PDBs, but anything would help as a starting point.
*EDIT history: -clarified that I have gene names for the proteins, not PDB-IDs. E.g., there could be many PDBs in RCSB that contain the same protein, and my goal is to find all of them in one go.
Thanks so much, Joe
You can download various types of bulk data files from RCSB here: https://www.rcsb.org/docs/programmatic-access/file-download-services
You can batch download data as shown here: https://www.rcsb.org/docs/programmatic-access/batch-downloads-with-shell-script
Hey, this is so close to what I'm looking for - I clarified in my edit. This is a great way to batch download lots of PDBs in RCSB by their PDB-ID. However, the same protein can be in many PDBs. I'm looking for a method to download all PDBs given a specific protein name/sequence