Hi, I have a set of genes with Refseq ids (ex. XM_020713141.1) and I want to convert it to EntrezID (ex. 101165603) for further analysis. I find similar question that said clusterProfiler is suitable for this purpose. [GeneBank accession 2 Entrez gene id ][1] However, I'd tried to find out Medaka (Oryzias latipes) annotation database in Bioconductor annotation packages because I use Medaka for research, there is only major species packages. Is there a way to access NCBI medaka annotation database to covert IDs? Or could you provide me some other method to solve this problem? I would be grateful if you could help me.
I am not sure, but you can try convert on DAVID or use Ensembl,http://grch37.ensembl.org/index.html
https://www.biotools.fr/ Try this
Sorry I don't think it has Oryzias latipes
Maybe UniProt Retrive/ID mapping is user-friendly and could help: https://www.uniprot.org/uploadlists/ You can submit list of RefSeq ID and then add a Entrez column to output table and then download it.
Moving this to a comment. Once you select
RefSeq id
as input, the only output option isUniProtKB
id. So this may require two passes if it works at all.Thank you for many suggestions! These are very useful for me and I successfully get almost all EntrezIDs by using biomaRt.
However, I still have some questions. Although I get almost all EntrezIDs, some are missing (results show NA). For example, XR_002293119.2 or XM_004081009.3 or XM_023961859.1. But when I try to search the EntrezID in NCBI website, I can find these EntrezID are 101158738, 101170377, 101155047.
I also tried to change attributes from entrezid to wikigene_id, but results were same (all show NA). Do you think this is because the difference of database version and is there a way to earn these EntrezID?
Since you are interested in Entrez IDs and starting with RefSeq accessions, why not use an NCBI tool? EDirect works fine for this.
I think it is because those genes are not part of the current Ensembl release (so either you wait for an update or use vkkodali's method): http://www.ensembl.org/Multi/Search/Results?q=XM_023961859
Thank you very much, both of you for your comments. I understand this is because these genes are not include in current Ensembl release and EDirect can solve this.
Thanks to vkkodali comment, I notice if I want to use EDirect by R, I can use reutils or rentrez. And I tried below command learning from above command,
But I can't earn EntrezID like above.
I was wondering if you could help me again. Thank you.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.You have to specifically ask for
esummary -format uid
. I am not sure how you do that in R.Sorry for twice. I notice I need to reply like this.
I see. I need to specify the format, but I still struggling how to specify "uid" by reutils.
Have you checked to see what is in
refseq2
?Edit: I see
Oh, I already get the results. Thank you for pointing out!
Sorry for several times, I still have one more question. The order of outputs is not same to inputs. So I'd also like to keep the order of outputs or extract both (refseqID and EntrezID in a same order) to find out which refseqID is link to specific EntrezID. I thought the option "correspondence" can keep the order, but it doesn't work.