How To Find All Human Genes Containing A Specific Domain And Locate The Start And Ending Sites Of The Domain In Those Genes?
2
1
Entering edit mode
13.4 years ago
User 5765 ▴ 10

I'm able to find genes containing a given domain with Biomart. However, I need to know the start and ending site of the specific domain in those genes. How can I do that?

Lots of thanks!!

protein • 4.5k views
ADD COMMENT
0
Entering edit mode

which Biomart API?

ADD REPLY
3
Entering edit mode
13.4 years ago

You can get this information using BioMart interface of InterPro. Use the following attributes to retrieve protein domain specific information.

Match Status
Match Start Position
Match Stop Position
Match Score

Try a sample query here and check the screenshot of a sample query here. If you are new to InterPro, more about InterPro is available here.

ADD COMMENT
0
Entering edit mode

You're answer is right. But I need further help to get human genes with Ensembl ID and the start and stop position of the domain at the same time. Is this feasible?

ADD REPLY
0
Entering edit mode

You have two options here. 1. Use an ID mapping service to map between Uniprot and Ensembl identifiers. 2. Use Ensembl API to retrieve to get the domain information. See the tutorial http://useast.ensembl.org/info/docs/api/core/core_tutorial.html and Protein Features section http://useast.ensembl.org/info/docs/api/core/core_tutorial.html#translations

ADD REPLY
2
Entering edit mode
13.4 years ago
Lyco ★ 2.3k

Depending on what domain database the hit came from, you could go to the web page of the domain database and see if they offer the data. Many of them do, e.g. Pfam. However, it is possible that the domain databases offer coordinates relative to the uniprot entries, not to the Ensembl proteins. Depending on what your plans are, this might be a problem. However, Biomart offers uniprot links, which are useful to make the connection.

ADD COMMENT
0
Entering edit mode

I disagree with one aspect of this answer. Ensembl is providing domain architecture information, including start and end information for specific domains. For example see this particular gene: http://useast.ensembl.org/Homo_sapiens/Transcript/ProteinSummary?db=core;g=ENSG00000203710;r=1:207669492-207813992;t=ENST00000400960 If you mouse-over a particular Pfam / SMART / InterPro domain, you can see the start and stop information of a particular domain.

ADD REPLY

Login before adding your answer.

Traffic: 2437 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6