Question

Get protein information from ensemblbacteria using interpro

0

Entering edit mode

18 months ago

Ishanisignup32 • 0

So I am trying to access protein information using interpro ids on ensemblbacteria. I have written a MySQL code in R however, I can't quite figure out how to get protein information using the ids using programming language. I have put in a picture of approximately what I want:

enter image description here

And this is my code:

library(tidyverse)
library(RMySQL)
con <- dbConnect(MySQL(), host = "mysql-eg-publicsql.ebi.ac.uk",
                 user = "anonymous", password = "", 
                 port = 4157)
ds <- dbGetQuery(con, "SHOW DATABASES")
dim(ds)
bacteria <- grep("bacteria", ds$Database, value = TRUE)
dbGetQuery(con, "USE bacteria_0_collection_core_47_100_1;")
dbGetQuery(con, "SHOW TABLES")

ensembl • 1.5k views

ADD COMMENT • link updated 17 months ago by sgiorgetti ▴ 10 • written 18 months ago by Ishanisignup32 • 0

1

Entering edit mode

You may need to ask Ensembl support if the mysql database has the information you are looking for.

This sort of query may be what you need: http://bacteria.ensembl.org/Multi/Search/Results?species=all;idx=;q=IPR000562;site=ensemblunit

ADD REPLY • link 18 months ago by GenoMax 147k

1

Entering edit mode

As per my comment to @GenoMax below, it is unclear what information you are trying to fetch, and from what starting point.

For instance, if you start from an Ensembl ID - say SAMN02982918_2340 - the approach suggested by Aleena looks sensible. If you instead are starting from an Interpro ID - like @GenoMax suggested instead - REST API end point would not be enough, and better go with SQL or the API. The SQL stmt below might be a (early) starting point

select x.dbprimary_acc, x.external_db_id, t.stable_id `translation_stable_id`, tr.stable_id `transcript_stable_id`, m.meta_value `species` from `bacteria_117_collection_core_56_109_1`.object_xref ox inner join `bacteria_117_collection_core_56_109_1`.xref x using(xref_id) inner join `bacteria_117_collection_core_56_109_1`.translation t on t.translation_id = ox.ensembl_object_type inner join `bacteria_117_collection_core_56_109_1`.transcript tr using(transcript_id) inner join `bacteria_117_collection_core_56_109_1`.seq_region sr using(seq_region_id) inner join `bacteria_117_collection_core_56_109_1`.coord_system cs using(coord_system_id) inner join `bacteria_117_collection_core_56_109_1`.meta m on cs.species_id = m.species_id where x.dbprimary_acc = 'IPR000562' and x.external_db_id = 1200 and ox.ensembl_object_type = 'Translation' and m.meta_key = 'species.production_name';

Finally, if you want to go down the DB route, as per the R code above, please consider that there are currently (Ensembl 109/56) 128 bacteria collection databases each hosting a number of species/strains to look into.

Happy to support and advise, but I suppose I'd need some clarification about the issue you are dealing with.

ADD REPLY • link 18 months ago by sgiorgetti ▴ 10

0

Entering edit mode

Thank you all for your help. I was able to figure it out on my own. If anyone is curious do let me know. For rest API it is not suited for my purpose.

ADD REPLY • link 18 months ago by Ishanisignup32 • 0

0

Entering edit mode

Please post your solution as an "answer" to provide closure to this thread. If the comment by @sgiorgetti helped solve the problem then I can move that comment to an answer. Accept one or more of these answers.

ADD REPLY • link 18 months ago by GenoMax 147k

score 1 · Answer 1 · 2023-05-18

1

Entering edit mode

18 months ago

A@Ensembl ▴ 30

Hi,

You can use Ensembl REST API for this query,

For instance: gene: ENSG00000157764 url: (https://rest.ensembl.org/lookup/id/ENSG00000157764?content-type=application/json)

gene: SAMN02982918 url: (https://rest.ensembl.org/lookup/id/SAMN02982918_2340?content-type=application/json)

You may also find the 'xref' endpoint (http://rest.ensembl.org/documentation/info/xref_id) useful. This endpoint will retrieve corresponding external references in other databases given an Ensembl stable ID.

I also wanted to point out that the IDs in your query are for human and not bacteria.

I hope this helps, Aleena

ADD COMMENT • link 18 months ago by A@Ensembl ▴ 30

0

Entering edit mode

Are you certain REST API works with Ensembl bacteria and for the specific query OP is looking for?

ADD REPLY • link 18 months ago by GenoMax 147k

0

Entering edit mode

The REST API works certainly with Ensembl bacteria in general. The specific query is different a topic: the original query was about protein information, but the provided example is purely genomic. Your comment above would make sense to me, but unsure about the original intentions and needs by @Ishanisignup32

ADD REPLY • link 18 months ago by sgiorgetti ▴ 10

0

Entering edit mode

protein information from ensemblbacteria using interpro

I was going by the title of this post which seems to be asking for Interpro.

InterPro ID's do not seem to work with the REST example above so perhaps they are not supported for that specific lookup?

ADD REPLY • link 18 months ago by GenoMax 147k

0

Entering edit mode

@GenoMax Apologies for my belated reply.

Agreed, the REST endpoint above supports Ensembl IDs only. Mentioned because both the initial question and Aleena's reply seemed to start from a gene Ensembl ID.

Should you want to start from an Interpro ID instead, a better choice might be [http://rest.ensembl.org/documentation/info/xref_external][http://rest.ensembl.org/documentation/info/xref_external]

Example: http://rest.ensembl.org/xrefs/symbol/Saccharomonospora_viridis_gca_900115515/IPR029058?external_db=Interpro

For further details, please do checkout the related documentation at the following link http://rest.ensembl.org/documentation/info/xref_external

Hope it helps

ADD REPLY • link 17 months ago by sgiorgetti ▴ 10