Hi All,
I am creating a pipeline for determining orthologs from given transcriptomes. One of the tools my pipeline leverages outputs Refseq predicted protein ids for each sequence that look like:
61622.XP_010357577.1,
8479.XP_005311777.2,
61622.XP_010357577.1,
10036.XP_005068815.1,
I now find myself stumped on how to convert these id's to gene names. I understand Biomart has the ability to filter based on Refseq IDs however that would require that I obtain the correct Mart object for the given Refseq species, which does not directly translate to ensembl datasets. Ex : (8479.XP_005311777.2 : Species = Emydinae, ID = XP_005311777.2 .......... no Emydinae dataset in ensembl). Does anyone know a way that I might be able to convert these id's to something a little more helpful (ensembl id's would even work as an intermediate).
Not sure where you ended up with these ID's but if we were to ignore the numerical part then using EntrezDirect you can get entrezID. Since these are predicted proteins there is likely not linked gene name/symbol and may not work in all cases (it seems to work with only one from your list.