Hi, I have a set of genes with Refseq ids (ex. XM_020713141.1) and I want to convert it to EntrezID (ex. 101165603) for further analysis.
I find similar question that said clusterProfiler is suitable for this purpose.
[GeneBank accession 2 Entrez gene id ][1]
However, I'd tried to find out Medaka (Oryzias latipes) annotation database in Bioconductor annotation packages because I use Medaka for research, there is only major species packages.
Is there a way to access NCBI medaka annotation database to covert IDs?
Or could you provide me some other method to solve this problem?
I would be grateful if you could help me.
Maybe UniProt Retrive/ID mapping is user-friendly and could help: https://www.uniprot.org/uploadlists/
You can submit list of RefSeq ID and then add a Entrez column to output table and then download it.
Moving this to a comment. Once you select RefSeq id as input, the only output option is UniProtKB id. So this may require two passes if it works at all.
Thank you for many suggestions!
These are very useful for me and I successfully get almost all EntrezIDs by using biomaRt.
However, I still have some questions.
Although I get almost all EntrezIDs, some are missing (results show NA).
For example, XR_002293119.2 or XM_004081009.3 or XM_023961859.1.
But when I try to search the EntrezID in NCBI website, I can find these EntrezID are 101158738, 101170377, 101155047.
I also tried to change attributes from entrezid to wikigene_id, but results were same (all show NA).
Do you think this is because the difference of database version and is there a way to earn these EntrezID?
Thank you very much, both of you for your comments.
I understand this is because these genes are not include in current Ensembl release and EDirect can solve this.
Thanks to vkkodali comment, I notice if I want to use EDirect by R, I can use reutils or rentrez.
And I tried below command learning from above command,
Please use ADD COMMENT/ADD REPLY when responding to existing posts to keep threads logically organized.
SUBMIT ANSWER is for new answers to original question.
esummary(refseq2)
You have to specifically ask for esummary -format uid. I am not sure how you do that in R.
Oh, I already get the results. Thank you for pointing out!
Sorry for several times, I still have one more question.
The order of outputs is not same to inputs.
So I'd also like to keep the order of outputs or extract both (refseqID and EntrezID in a same order) to find out which refseqID is link to specific EntrezID.
I thought the option "correspondence" can keep the order, but it doesn't work.
Assuming the IDs you have are all derived from the Refseq predicted mRNA (e.g. XM_####).
R solution:
library(biomaRt)
mart <- useMart("ENSEMBL_MART_ENSEMBL",dataset="olatipes_gene_ensembl",host="www.ensembl.org")
BM.info <- getBM(attributes=c('entrezgene','refseq_mrna_predicted'),mart = mart)
## make a function to remove weird numbers in your annotation names
trim.numbers <- function(name){ gsub("\\.[0-9]","",name) }
## match your trimmed refseq IDs to the dataframe and pull out the corresponding entrez id - example below
BM.info$entrezgene[match(trim.numbers('XM_020713141.1'),BM.info$refseq_mrna_predicted)]
[1] 101165603
## how it can be used with multiple ids...
## select ids
multiple.ids <- c("XM_020704464","XM_011491436","XM_020702270","XM_023957409","XM_011476326")
## find entrez ids
BM.info$entrezgene[match(trim.numbers(multiple.ids),BM.info$refseq_mrna_predicted)]
[1] 101165143 101173426 101155179 101167210 101162526
Go to Batch Entrez and upload your list of RefSeq accessions. Choose 'Nucleotide' as the database. Click the 'Retrieve' button.
Once you are in the results page, you will find 'Find related data' widget on the right hand side. From the drop-down list, choose 'Gene'. Click 'Find Items' button.
If you just want the list of the unique identifiers, use the 'Send To' menu on the top right corner and choose 'UI List' as the format.
EDirect
Check out bit.ly/entrez-direct for more information. The command to use here would be this:
I am not sure, but you can try convert on DAVID or use Ensembl,http://grch37.ensembl.org/index.html
https://www.biotools.fr/ Try this
Sorry I don't think it has Oryzias latipes
Maybe UniProt Retrive/ID mapping is user-friendly and could help: https://www.uniprot.org/uploadlists/ You can submit list of RefSeq ID and then add a Entrez column to output table and then download it.
Moving this to a comment. Once you select
RefSeq id
as input, the only output option isUniProtKB
id. So this may require two passes if it works at all.Thank you for many suggestions! These are very useful for me and I successfully get almost all EntrezIDs by using biomaRt.
However, I still have some questions. Although I get almost all EntrezIDs, some are missing (results show NA). For example, XR_002293119.2 or XM_004081009.3 or XM_023961859.1. But when I try to search the EntrezID in NCBI website, I can find these EntrezID are 101158738, 101170377, 101155047.
I also tried to change attributes from entrezid to wikigene_id, but results were same (all show NA). Do you think this is because the difference of database version and is there a way to earn these EntrezID?
Since you are interested in Entrez IDs and starting with RefSeq accessions, why not use an NCBI tool? EDirect works fine for this.
I think it is because those genes are not part of the current Ensembl release (so either you wait for an update or use vkkodali's method): http://www.ensembl.org/Multi/Search/Results?q=XM_023961859
Thank you very much, both of you for your comments. I understand this is because these genes are not include in current Ensembl release and EDirect can solve this.
Thanks to vkkodali comment, I notice if I want to use EDirect by R, I can use reutils or rentrez. And I tried below command learning from above command,
But I can't earn EntrezID like above.
I was wondering if you could help me again. Thank you.
Please use
ADD COMMENT/ADD REPLY
when responding to existing posts to keep threads logically organized.SUBMIT ANSWER
is for new answers to original question.You have to specifically ask for
esummary -format uid
. I am not sure how you do that in R.Sorry for twice. I notice I need to reply like this.
I see. I need to specify the format, but I still struggling how to specify "uid" by reutils.
Have you checked to see what is in
refseq2
?Edit: I see
Oh, I already get the results. Thank you for pointing out!
Sorry for several times, I still have one more question. The order of outputs is not same to inputs. So I'd also like to keep the order of outputs or extract both (refseqID and EntrezID in a same order) to find out which refseqID is link to specific EntrezID. I thought the option "correspondence" can keep the order, but it doesn't work.