John,
You could consider rentrez, which is a pretty low-level R library for the entrez API (which I wrote!).
In this case you would want to find links to other databases from your gene IDs, and then get the summary file for the refseq record:
library(rentrez)
links <- entrez_link(db = "all", ids = 353174, dbfrom = "gene")
names(links)
## [1] "gene_bioproject" "gene_biosystems_all"
## [3] "gene_cdd" "gene_dbvar"
## [5] "gene_genome" "gene_geoprofiles"
## [7] "gene_gtr" "gene_homologene"
## [9] "gene_nuccore" "gene_nuccore_mgc"
## [11] "gene_nuccore_pos" "gene_nuccore_refseqrna"
## [13] "gene_nucest" "gene_nucest_clust"
## [15] "gene_nucleotide" "gene_nucleotide_clust"
## [17] "gene_nucleotide_mgc" "gene_nucleotide_mgc_url"
## [19] "gene_nucleotide_pos" "gene_omim"
## [21] "gene_pcassay" "gene_pcassay_proteintarget"
## [23] "gene_pccompound" "gene_pcsubstance"
## [25] "gene_pmc" "gene_probe"
## [27] "gene_protein" "gene_protein_refseq"
## [29] "gene_pubmed" "gene_pubmed_citedinomim"
## [31] "gene_pubmed_rif" "gene_snp"
## [33] "gene_snp_genegenotype" "gene_taxonomy"
## [35] "gene_unigene" "gene_unists"
## [37] "file"
Once you have ID for the refseq record, you can fetch an (unparsed) XML file with
entrez_summary:
e_sum <- entrez_summary(db = "nuccore", id = links$gene_nuccore_refseqrna)
print(e_sum)
##
## http://www.ncbi.nlm.nih.gov/entrez/query/DTD/eSummary_041029.dtd">
## <eSummaryResult>
## <DocSum>
## <Id>206725455</Id>
## <Item Name="Caption" Type="String">NM_180990</Item>
## <Item Name="Title" Type="String">Homo sapiens zinc activated ligand-gated ion channel (ZACN), mRNA</Item>
## <Item Name="Extra" Type="String">gi|206725455|ref|NM_180990.3|[206725455]</Item>
## <Item Name="Gi" Type="Integer">206725455</Item>
## <Item Name="CreateDate" Type="String">2003/05/15</Item>
## <Item Name="UpdateDate" Type="String">2012/11/11</Item>
## <Item Name="Flags" Type="Integer">512</Item>
## <Item Name="TaxId" Type="Integer">9606</Item>
## <Item Name="Length" Type="Integer">1473</Item>
## <Item Name="Status" Type="String">live</Item>
## <Item Name="ReplacedBy" Type="String"/>
## <Item Name="Comment" Type="String"></Item>
## </DocSum>
## </eSummaryResult>
##
Presmusing this is the process you would go through, then wrapping it up in a function (and using Xpath to extract information from the XML) is straightforward:
get_entrez_length <- function(e_id) {
links <- entrez_link(db = "nuccore", ids = e_id, dbfrom = "gene")
summary_xml <- entrez_summary(db = "nuccore", id = links$gene_nuccore_refseqrna)
len <- xpathSApply(summary_xml, "//Item[@Name='Length']", xmlValue)
return(as.integer(len))
}
get_entrez_length(353174)
## [1] 1473
If you do use rentrez
and have any questions/suggestions/bugs let me know - still developing it so happy to make it more useful.