How to get the protein-domain relationship?
3
0
Entering edit mode
5.7 years ago

Dears,

I am searching for a dataset that contain all human proteins with the domains of each protein. (i.e. I want to know the for each protein, the set of domains inside it) Can someone help me with that?

Thanks in advance

Protein-Domain • 1.7k views
ADD COMMENT
0
Entering edit mode

Thank you for your answer. I am not sure that I explained my need clearly, but I found another resource which has exactly what I need Pfam

This is what I need I should use this iterative many times as I will have a list of proteins ID (e.g. Uniprot ACC) and I need for each protein name the correspocding domains inside it ( as for the SQL query in the image)

But the question is: Do I need to download the whole database (or even a subset of it ) to be able to run this query ?

Image at : https://ibb.co/jJYCHgP Image Souce: https://pfam.xfam.org/help#tabview=tab12

ADD REPLY
0
Entering edit mode

Please ask your current question to help desk of pfam at https://pfam.xfam.org/help#tabview=tab17

ADD REPLY
0
Entering edit mode

I will do Thank you very much

ADD REPLY
0
Entering edit mode

No, do not do that. Wait if people can help here as genomax now did. Help desks are intended for technical debugging not for guiding users who need tutorials that can be found elsewhere in the web or in bioinformatic communities.

ADD REPLY
3
Entering edit mode
5.7 years ago
GenoMax 147k
  1. Click on this link (page at Uniprot).
  2. Click on the Columns button in top row. Find family/domains section and expand it. Select features you need. You can even choose additional databases for familt/domains (second section). Click Save to apply your selection.
  3. Back on the search page click on Download after selecting all proteins (SwissProt, human reviewed, ~20,404 as of Mar 2019 or add TrEMBL, if you want un-reviewed ones also). Choose a format you like. Tab separated would likely work best or plain text.
  4. Wait for the download to complete.

Here are examples of what you should see depending on what columns you select.

Entry   Entry name  Status  Protein names   Gene names  Organism    Length  Domain [FT]
Q00604  NDP_HUMAN   reviewed    Norrin (Norrie disease protein) (X-linked exudative vitreoretinopathy 2 protein)    NDP EVR2    Homo sapiens (Human)    133 DOMAIN 39 132 CTCK. {ECO:0000255|PROSITE-ProRule:PRU00039}.

Second example

Entry   Entry name  Status  Protein names   Gene names  Organism    Length  Domain [FT] Domain [CC]
Q9HB19  PKHA2_HUMAN reviewed    Pleckstrin homology domain-containing family A member 2 (PH domain-containing family A member 2) (Tandem PH domain-containing protein 2) (TAPP-2)   PLEKHA2 TAPP2   Homo sapiens (Human)    425 DOMAIN 7 113 PH 1. {ECO:0000255|PROSITE-ProRule:PRU00145}.; DOMAIN 198 298 PH 2. {ECO:0000255|PROSITE-ProRule:PRU00145}.
ADD COMMENT
0
Entering edit mode

Thank you very much This is exactly what I need, it seems like I didn't explain it clearly at the beginning I tried it and I extracted only "Entry", "Entry name" and "Domain [FT]" and I can finish the work now. And thank you again for making it simple :)

ADD REPLY
2
Entering edit mode
ADD COMMENT
0
Entering edit mode

Thank you. I will try the second one, because I tried pfam and I have a question regarding it. Can I query Pfam Database (using SQL) online ?

Thanks in advance

ADD REPLY

Login before adding your answer.

Traffic: 2513 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6