Downloaded pdb's on rcsb.org
1
0
Entering edit mode
7 months ago
iamsmor • 0

Hello everyone

I am working on molecular docking and I want to download some pdb's as pdb format according to search (name of protein, name of organism) on rcsb.org. Can someone help me if there is a way to do it, how it can be done?

Thanks for any help

rcsb pdb • 1.6k views
ADD COMMENT
2
Entering edit mode

Download services for PDB are described on this page: https://www.rcsb.org/docs/programmatic-access/file-download-services

It could be as simple as grabbing a file with curl/wget using https://files.rcsb.org/view/4hhb.pdb as an example PDB accession.

ADD REPLY
0
Entering edit mode

Thank you very much. Actually I looked at there, but actually I want to find something like according to search url like this QUERY: Gene Name = "AHR" AND Scientific Name of the Source Organism = "Homo sapiens" use something like bioython or I don't know made script for automating downloading process.

ADD REPLY
1
Entering edit mode

PDB has a search API: https://search.rcsb.org/#search-example-1

Here's the JSON from your search query:

{
    "query": {
        "type": "group",
        "nodes": [
            {
                "type": "group",
                "nodes": [
                    {
                        "type": "terminal",
                        "service": "text",
                        "parameters": {
                            "attribute": "rcsb_entity_source_organism.rcsb_gene_name.value",
                            "negation": false,
                            "operator": "exact_match",
                            "value": "AHR"
                        }
                    },
                    {
                        "type": "group",
                        "nodes": [
                            {
                                "type": "group",
                                "nodes": [
                                    {
                                        "type": "terminal",
                                        "service": "text",
                                        "parameters": {
                                            "attribute": "rcsb_entity_source_organism.ncbi_scientific_name",
                                            "value": "Homo%20sapiens",
                                            "operator": "exact_match"
                                        }
                                    }
                                ],
                                "logical_operator": "or",
                                "label": "rcsb_entity_source_organism.ncbi_scientific_name"
                            }
                        ],
                        "logical_operator": "and"
                    }
                ],
                "logical_operator": "and",
                "label": "text"
            }
        ],
        "logical_operator": "and"
    },
    "return_type": "entry",
    "request_options": {
        "paginate": {
            "start": 0,
            "rows": 25
        },
        "results_content_type": [
            "experimental"
        ],
        "sort": [
            {
                "sort_by": "score",
                "direction": "desc"
            }
        ],
        "scoring_strategy": "combined"
    },
    "request_info": {
        "query_id": "80f5cb00127713554e0dd5ce36ae71bd"
    }
}

Compare a JSON there and your example query to construct a custom JSON and use the API with that JSON.

ADD REPLY
1
Entering edit mode

That's how they seem to have written their query - automating that is a bit of a pain though as it takes a crazy JSON as input.

ADD REPLY
1
Entering edit mode

For a non-programmer using the search builder link included above may be the best option. Even that is not very user friendly.

ADD REPLY
3
Entering edit mode
7 months ago
Ram 44k

I'm going to build off of OP's query and give them a simple script:

organism=$(echo $1 | sed 's/ /%20/g')
gene=$2

curl -s https://search.rcsb.org/rcsbsearch/v2/query\?json\=%7B%22query%22%3A%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.rcsb_gene_name.value%22%2C%22negation%22%3Afalse%2C%22operator%22%3A%22exact_match%22%2C%22value%22%3A%22$gene%22%7D%7D%2C%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22group%22%2C%22nodes%22%3A%5B%7B%22type%22%3A%22terminal%22%2C%22service%22%3A%22text%22%2C%22parameters%22%3A%7B%22attribute%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%2C%22value%22%3A%22$organism%22%2C%22operator%22%3A%22exact_match%22%7D%7D%5D%2C%22logical_operator%22%3A%22or%22%2C%22label%22%3A%22rcsb_entity_source_organism.ncbi_scientific_name%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%5D%2C%22logical_operator%22%3A%22and%22%2C%22label%22%3A%22text%22%7D%5D%2C%22logical_operator%22%3A%22and%22%7D%2C%22return_type%22%3A%22entry%22%2C%22request_options%22%3A%7B%22paginate%22%3A%7B%22start%22%3A0%2C%22rows%22%3A250%7D%2C%22results_content_type%22%3A%5B%22experimental%22%5D%2C%22sort%22%3A%5B%7B%22sort_by%22%3A%22score%22%2C%22direction%22%3A%22desc%22%7D%5D%2C%22scoring_strategy%22%3A%22combined%22%7D%2C%22request_info%22%3A%7B%22query_id%22%3A%2280f5cb00127713554e0dd5ce36ae71bd%22%7D%7D | grep identifier | cut -d: -f2 | tr -d ' ",'

Save it as get_my_data.bash and then run it as

bash get_my_data.bash "Homo sapiens" AHR

Remember to provide the species in double quotes as it is a multi-word argument.

Sample runs:

$ bash get_my_data.bash "Homo sapiens" AHR
5NJ8
5V0L
7ZUB
8QMO

$ bash get_my_data.bash "Homo sapiens" TP53
1DT7
1JSP
1KZY
1MA3
1XQH
1YC5
1YCQ
1YCR
2B3G
2FEJ
2FOJ
2FOO
2GS0
2H2D
2H2F
2H4F
2H4H
2H4J
2H59
2K8F
2LY4
2MEJ
2MZD
2PCX
2RUK
..
..
..
ADD COMMENT
1
Entering edit mode

This may be the best option.

Get the PDB ID's

$ ./get.sh "Mus musculus" AHR
4M4X
5NJ8
5V0L
8H77

Then use curl to get the actual files

$ curl -o 4M4X.pdb  https://files.rcsb.org/view/4M4X.pdb

@Ram you could modify your script to grab the PDB files directly.

ADD REPLY
1
Entering edit mode

Sure. Or, one could do:

bash get_my_data.bash "Homo sapiens" AHR | xargs -I v_pdb curl -s -o v_pdb.pdb  https://files.rcsb.org/view/v_pdb.pdb
ADD REPLY
0
Entering edit mode

thank you very much

ADD REPLY
1
Entering edit mode

It gives you the first 25 results though. I'll see if I can change that.

EDIT: I've updated that number to 250. I'm hoping you won't need more than that. Removing that number is a pain though, so change 250 to 2500 if you need even more results.

EDIT-2: I tried removing the max results parameter - it then only returns the top 10 results. I'd stick with the current version.

ADD REPLY
0
Entering edit mode

thank you so much

ADD REPLY

Login before adding your answer.

Traffic: 1920 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6