I need a small protein that has multiple chains, but the total number of atoms will not be more than fifty.
I need this to experiment with in my software.
Can you suggest such a protein PDB?
I need a small protein that has multiple chains, but the total number of atoms will not be more than fifty.
I need this to experiment with in my software.
Can you suggest such a protein PDB?
using sparql uniprot:https://sparql.uniprot.org/
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> | |
PREFIX wikibase: <http://wikiba.se/ontology#> | |
PREFIX wdt: <http://www.wikidata.org/prop/direct/> | |
PREFIX wd: <http://www.wikidata.org/entity/> | |
PREFIX vg: <http://biohackathon.org/resource/vg#> | |
PREFIX up: <http://purl.uniprot.org/core/> | |
PREFIX uniprotkb: <http://purl.uniprot.org/uniprot/> | |
PREFIX uberon: <http://purl.obolibrary.org/obo/uo#> | |
PREFIX taxon: <http://purl.uniprot.org/taxonomy/> | |
PREFIX sp: <http://spinrdf.org/sp#> | |
PREFIX skos: <http://www.w3.org/2004/02/skos/core#> | |
PREFIX sio: <http://semanticscience.org/resource/> | |
PREFIX sh: <http://www.w3.org/ns/shacl#> | |
PREFIX sd: <http://www.w3.org/ns/sparql-service-description#> | |
PREFIX schema: <http://schema.org/> | |
PREFIX sachem: <http://bioinfo.uochb.cas.cz/rdf/v1.0/sachem#> | |
PREFIX rh: <http://rdf.rhea-db.org/> | |
PREFIX pubmed: <http://rdf.ncbi.nlm.nih.gov/pubmed/> | |
PREFIX ps: <http://www.wikidata.org/prop/statement/> | |
PREFIX pq: <http://www.wikidata.org/prop/qualifier/> | |
PREFIX patent: <http://data.epo.org/linked-data/def/patent/> | |
PREFIX p: <http://www.wikidata.org/prop/> | |
PREFIX owl: <http://www.w3.org/2002/07/owl#> | |
PREFIX orthodbGroup: <http://purl.orthodb.org/odbgroup/> | |
PREFIX orthodb: <http://purl.orthodb.org/> | |
PREFIX orth: <http://purl.org/net/orth#> | |
PREFIX oboInOwl: <http://www.geneontology.org/formats/oboInOwl#> | |
PREFIX obo: <http://purl.obolibrary.org/obo/> | |
PREFIX np: <http://nextprot.org/rdf#> | |
PREFIX nextprot_cv: <http://nextprot.org/rdf/terminology/> | |
PREFIX nextprot: <http://nextprot.org/rdf/entry/> | |
PREFIX mnx: <https://rdf.metanetx.org/schema/> | |
PREFIX mnet: <https://rdf.metanetx.org/mnet/> | |
PREFIX mesh: <http://id.nlm.nih.gov/mesh/> | |
PREFIX lscr: <http://purl.org/lscr#> | |
PREFIX lipidmaps: <https://www.lipidmaps.org/rdf/> | |
PREFIX keywords: <http://purl.uniprot.org/keywords/> | |
PREFIX insdcschema: <http://ddbj.nig.ac.jp/ontologies/nucleotide/> | |
PREFIX insdc: <http://identifiers.org/insdc/> | |
PREFIX identifiers: <http://identifiers.org/> | |
PREFIX glyconnect: <https://purl.org/glyconnect/> | |
PREFIX glycan: <http://purl.jp/bio/12/glyco/glycan#> | |
PREFIX genex: <http://purl.org/genex#> | |
PREFIX foaf: <http://xmlns.com/foaf/0.1/> | |
PREFIX faldo: <http://biohackathon.org/resource/faldo#> | |
PREFIX eunisSpecies: <http://eunis.eea.europa.eu/rdf/species-schema.rdf#> | |
PREFIX ensembltranscript: <http://rdf.ebi.ac.uk/resource/ensembl.transcript/> | |
PREFIX ensemblterms: <http://rdf.ebi.ac.uk/terms/ensembl/> | |
PREFIX ensemblprotein: <http://rdf.ebi.ac.uk/resource/ensembl.protein/> | |
PREFIX ensemblexon: <http://rdf.ebi.ac.uk/resource/ensembl.exon/> | |
PREFIX ensembl: <http://rdf.ebi.ac.uk/resource/ensembl/> | |
PREFIX ec: <http://purl.uniprot.org/enzyme/> | |
PREFIX dcterms: <http://purl.org/dc/terms/> | |
PREFIX dc: <http://purl.org/dc/terms/> | |
PREFIX chebislash: <http://purl.obolibrary.org/obo/chebi/> | |
PREFIX chebihash: <http://purl.obolibrary.org/obo/chebi#> | |
PREFIX cco: <http://rdf.ebi.ac.uk/terms/chembl#> | |
PREFIX busco: <http://busco.ezlab.org/schema#> | |
PREFIX bibo: <http://purl.org/ontology/bibo/> | |
PREFIX allie: <http://allie.dbcls.jp/> | |
PREFIX SWISSLIPID: <https://swisslipids.org/rdf/SLM_> | |
PREFIX GO: <http://purl.obolibrary.org/obo/GO_> | |
PREFIX ECO: <http://purl.obolibrary.org/obo/ECO_> | |
PREFIX CHEBI: <http://purl.obolibrary.org/obo/CHEBI_> | |
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> | |
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> | |
SELECT distinct ?prot ?protName (COUNT(DISTINCT ?chain) as ?n_chains) | |
WHERE | |
{ | |
?prot <http://purl.uniprot.org/core/mnemonic> ?protName . | |
?prot <http://purl.uniprot.org/core/annotation> ?chain . | |
?prot a <http://purl.uniprot.org/core/Protein> . | |
?chain a <http://purl.uniprot.org/core/Peptide_Annotation> . | |
?prot <http://purl.uniprot.org/core/sequence> ?sequence . | |
?chain rdfs:comment ?chainTitle . | |
?sequence rdf:value ?pep . | |
FILTER( strlen(str(?pep)) > 60 ) . | |
FILTER( regex(str(?chainTitle),"chain" ) ) . | |
} | |
GROUP BY ?prot ?protName | |
HAVING( (COUNT(DISTINCT ?chain))=2) | |
ORDER by ?prot ?n_chains |
prot n_chains
http://purl.uniprot.org/uniprot/A0A0B5A7M7 "2"
http://purl.uniprot.org/uniprot/A0A0B5A7N1 "2"
http://purl.uniprot.org/uniprot/A0A0B5A7N5 "2"
http://purl.uniprot.org/uniprot/A0A0B5A7N8 "2"
http://purl.uniprot.org/uniprot/A0A0B5A7P2 "2"
http://purl.uniprot.org/uniprot/A0A0B5A8P4 "2"
http://purl.uniprot.org/uniprot/A0A0B5A8P8 "2"
http://purl.uniprot.org/uniprot/A0A0B5A8Q2 "2"
http://purl.uniprot.org/uniprot/A0A0B5A8Q6 "2"
....
Pierre Lindenbaum Additional requirement is that the protein be < 50 AA total (including both chains). Can you add that limit to this query?
I am really impressed by this answer and will bookmark it, in case I ever need to search PDB. However, the original post asks for 50 atoms and not 50 amino acids.
You are probably correct to assume that 50 amino acids were sought after, since a glycine dipeptide already has 5 carbon atoms, 2 nitrogen atoms, 2 oxygen atoms and 10 hydrogen atoms = 19 atoms in total. A H-bonded Dicysteine C6 H14 N2 O4 S2 = 28 atoms, when oxidized as Cystine 26 atoms. Glutathione already has C10 H17 N3 O6 S = 37 atoms.
But nothing in that range would be considered a protein or peptide hormone...
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
This is a perfect example of a question that you can ask one of the AI chatbots. Insulin may fit but it will have 60 AA.
I asked GPT4. There was no name given.
There's a 'Chain size' entry on the search form on OCA, here under 'Additional searches' towards the bottom of the Search page, that would help you explore the many options. It allows setting ranges and 'less than' settings.
Unfortunately, it doesn't also have an option for number of chains, however, it may make for a starting point to further explore.
OCA makes it easy to browser PDB entries and summarizes reports regularly, see here.
REFERENCE: Prilusky, J. (1996), "OCA, a browser-database for protein structure/function." URL http://oca.weizmann.ac.il and mirrors worldwide.