Hi,
I am using biomart to convert gene IDs into entrezid accessions. I am working with arabidopsis genes, and sometimes, the query will not return an entrezid. However, when I look up that gene in the ensembl webpage, I am able to find the correspondence. Maybe I am using the wrong Mart/Dataset?
Here is the code I am using for reference and the results I obtain:
library(biomaRt)
ensembl = useMart("plants_mart",host="plants.ensembl.org")
ensembl = useDataset("athaliana_eg_gene",mart=ensembl)
genes = c("AT2G14610","AT4G23700","AT3G26830","AT3G15950","AT3G54830","AT5G24105")
query = getBM(attributes=c("ensembl_gene_id",
"entrezgene_id",
"refseq_dna",
"entrezgene_accession"),
filters=("ensembl_gene_id"),
values=genes,mart=ensembl)
> query
ensembl_gene_id entrezgene_id refseq_dna entrezgene_accession
1 AT2G14610 815949 NM_127025.3 815949
2 AT3G15950 820839 NM_112465.4 820839
3 AT3G15950 820839 NM_001035631.2 820839
4 AT3G15950 820839 NM_001338192.1 820839
5 AT3G15950 820839 NM_001338191.1 820839
6 AT3G15950 820839 NM_001338193.1 820839
7 AT3G26830 822298 NM_113595.4 822298
8 AT3G54830 NA NA
9 AT4G23700 828470 NM_001341626.1 828470
10 AT4G23700 828470 NM_118501.5 828470
11 AT5G24105 2745995 NM_203099.2 2745995
In this case, AT3G54830 does not show any entrezgene_id or refseq_dna. However, when I manually search for it at the plant.ensembl.org or NCBI webpages, I can find it:
Or when I search for it in the NCBI webpage:
https://www.ncbi.nlm.nih.gov/gene/824648
Any help would be appreciated!
Thanks!
biomaRt just interacts with the internal servers at Ensembl, so, just acts as an interface to whatever is held at Ensembl. This gene just appears to not yet have a NM ID, but it is listed under the
refseq_peptide
attribute.But when I look it up in the NCBI website, it returns entrez_id 824648, and once there, I can find the NM IDs: NM_001339700.1 and NM_115340.3 corresponding to those two refseq_peptide entries.
I guess it could be because the database at ensembl is not updated? I find it weird, as it looks like the original sequence of the transcript was uploaded in 2016.
Is there any way of querying NCBI database with biomart?
I see what you mean. It can be that Ensembl's databases have not yet updated - not sure how it works internally. In terms of automated annotation, though, there are basically 2 main ways:
Each has pros and cons.
In your case, it seems better to use org.db (see answer below)