Question

Downloaded pdb's on rcsb.org

0

Entering edit mode

7 months ago

iamsmor • 0

Hello everyone

I am working on molecular docking and I want to download some pdb's as pdb format according to search (name of protein, name of organism) on rcsb.org. Can someone help me if there is a way to do it, how it can be done?

Thanks for any help

rcsb pdb • 1.6k views

ADD COMMENT • link updated 10 weeks ago by Ram 44k • written 7 months ago by iamsmor • 0

2

Entering edit mode

Download services for PDB are described on this page: https://www.rcsb.org/docs/programmatic-access/file-download-services

It could be as simple as grabbing a file with curl/wget using https://files.rcsb.org/view/4hhb.pdb as an example PDB accession.

ADD REPLY • link 7 months ago by GenoMax 147k

0

Entering edit mode

Thank you very much. Actually I looked at there, but actually I want to find something like according to search url like this QUERY: Gene Name = "AHR" AND Scientific Name of the Source Organism = "Homo sapiens" use something like bioython or I don't know made script for automating downloading process.

ADD REPLY • link updated 7 months ago by Ram 44k • written 7 months ago by iamsmor • 0

1

Entering edit mode

PDB has a search API: https://search.rcsb.org/#search-example-1

Here's the JSON from your search query:

{
    "query": {
        "type": "group",
        "nodes": [
            {
                "type": "group",
                "nodes": [
                    {
                        "type": "terminal",
                        "service": "text",
                        "parameters": {
                            "attribute": "rcsb_entity_source_organism.rcsb_gene_name.value",
                            "negation": false,
                            "operator": "exact_match",
                            "value": "AHR"
                        }
                    },
                    {
                        "type": "group",
                        "nodes": [
                            {
                                "type": "group",
                                "nodes": [
                                    {
                                        "type": "terminal",
                                        "service": "text",
                                        "parameters": {
                                            "attribute": "rcsb_entity_source_organism.ncbi_scientific_name",
                                            "value": "Homo%20sapiens",
                                            "operator": "exact_match"
                                        }
                                    }
                                ],
                                "logical_operator": "or",
                                "label": "rcsb_entity_source_organism.ncbi_scientific_name"
                            }
                        ],
                        "logical_operator": "and"
                    }
                ],
                "logical_operator": "and",
                "label": "text"
            }
        ],
        "logical_operator": "and"
    },
    "return_type": "entry",
    "request_options": {
        "paginate": {
            "start": 0,
            "rows": 25
        },
        "results_content_type": [
            "experimental"
        ],
        "sort": [
            {
                "sort_by": "score",
                "direction": "desc"
            }
        ],
        "scoring_strategy": "combined"
    },
    "request_info": {
        "query_id": "80f5cb00127713554e0dd5ce36ae71bd"
    }
}

Compare a JSON there and your example query to construct a custom JSON and use the API with that JSON.

ADD REPLY • link 7 months ago by Ram 44k

1

Entering edit mode

You can use the "Advanced query" builder (https://www.rcsb.org/search/advanced ) to create a query like:

https://www.rcsb.org/search?request=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22logical_operator%22%3A%22and%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.taxonomy_lineage.name%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22Homo%20sapiens%22%7D%7D%2C%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.rcsb_gene_name.value%22%2C%22operator%22%3A%22exact_match%22%2C%22negation%22%3Afalse%2C%22value%22%3A%22AHR%22%7D%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22label%22%3A%22text%22%7D%5D%7D%2C%22return_type%22%3A%22entry%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A25%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%2296ab84f1e1ba146fc2d50034b746143e%22%7D%7D

ADD REPLY • link 7 months ago by GenoMax 147k

1

Entering edit mode

That's how they seem to have written their query - automating that is a bit of a pain though as it takes a crazy JSON as input.

ADD REPLY • link 7 months ago by Ram 44k

1

Entering edit mode

For a non-programmer using the search builder link included above may be the best option. Even that is not very user friendly.

ADD REPLY • link 7 months ago by GenoMax 147k

score 3 · Accepted Answer · 2024-04-18

3

Entering edit mode

7 months ago

Ram 44k

I'm going to build off of OP's query and give them a simple script:

organism=$(echo $1 | sed 's/ /%20/g')
gene=$2

curl -s https://search.rcsb.org/rcsbsearch/v2/query\?json\=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.rcsb_gene_name.value%22%2C%22negation%22%3Afalse%2C%22operator%22%3A%22exact_match%22%2C%22value%22%3A%22$gene%22%7D%7D%2C%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%2C%22value%22%3A%22$organism%22%2C%22operator%22%3A%22exact_match%22%7D%7D%5D%2C%22logical_operator%22%3A%22or%22%2C%22label%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22logical_operator%22%3A%22and%22%2C%22label%22%3A%22text%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%2C%22return_type%22%3A%22entry%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A250%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%2280f5cb00127713554e0dd5ce36ae71bd%22%7D%7D | grep identifier | cut -d: -f2 | tr -d ' ",'

Save it as get_my_data.bash and then run it as

bash get_my_data.bash "Homo sapiens" AHR

Remember to provide the species in double quotes as it is a multi-word argument.

Sample runs:

$ bash get_my_data.bash "Homo sapiens" AHR
5NJ8
5V0L
7ZUB
8QMO

$ bash get_my_data.bash "Homo sapiens" TP53
1DT7
1JSP
1KZY
1MA3
1XQH
1YC5
1YCQ
1YCR
2B3G
2FEJ
2FOJ
2FOO
2GS0
2H2D
2H2F
2H4F
2H4H
2H4J
2H59
2K8F
2LY4
2MEJ
2MZD
2PCX
2RUK
..
..
..

ADD COMMENT • link 7 months ago by Ram 44k

1

Entering edit mode

This may be the best option.

Get the PDB ID's

$ ./get.sh "Mus musculus" AHR
4M4X
5NJ8
5V0L
8H77

Then use curl to get the actual files

$ curl -o 4M4X.pdb  https://files.rcsb.org/view/4M4X.pdb

@Ram you could modify your script to grab the PDB files directly.

ADD REPLY • link 7 months ago by GenoMax 147k

1

Entering edit mode

Sure. Or, one could do:

bash get_my_data.bash "Homo sapiens" AHR | xargs -I v_pdb curl -s -o v_pdb.pdb  https://files.rcsb.org/view/v_pdb.pdb

ADD REPLY • link 10 weeks ago by Ram 44k

0

Entering edit mode

thank you very much

ADD REPLY • link 7 months ago by iamsmor • 0

1

Entering edit mode

It gives you the first 25 results though. I'll see if I can change that.

EDIT: I've updated that number to 250. I'm hoping you won't need more than that. Removing that number is a pain though, so change 250 to 2500 if you need even more results.

EDIT-2: I tried removing the max results parameter - it then only returns the top 10 results. I'd stick with the current version.