Question

Refseq IDs conversion to gene symbol

0

Entering edit mode

16 months ago

rgrindle • 0

Hi All,

I am creating a pipeline for determining orthologs from given transcriptomes. One of the tools my pipeline leverages outputs Refseq predicted protein ids for each sequence that look like:

61622.XP_010357577.1, 
8479.XP_005311777.2, 
61622.XP_010357577.1, 
10036.XP_005068815.1,

I now find myself stumped on how to convert these id's to gene names. I understand Biomart has the ability to filter based on Refseq IDs however that would require that I obtain the correct Mart object for the given Refseq species, which does not directly translate to ensembl datasets. Ex : (8479.XP_005311777.2 : Species = Emydinae, ID = XP_005311777.2 .......... no Emydinae dataset in ensembl). Does anyone know a way that I might be able to convert these id's to something a little more helpful (ensembl id's would even work as an intermediate).

Ensembl Orthology Symbol Refseq Gene • 545 views

ADD COMMENT • link updated 16 months ago by GenoMax 152k • written 16 months ago by rgrindle • 0

0

Entering edit mode

Not sure where you ended up with these ID's but if we were to ignore the numerical part then using EntrezDirect you can get entrezID. Since these are predicted proteins there is likely not linked gene name/symbol and may not work in all cases (it seems to work with only one from your list.

$ esearch -db protein -query XP_005311777 | elink -target gene | esummary | xtract -pattern DocumentSummary -element Id,Name,ScientificName
101949809   RHBDD3  Chrysemys picta

ADD REPLY • link 16 months ago by GenoMax 152k