Easiest Way To Get Mrna Refseq Acc Related To An Entrez Gene Id Using Ncbi Eutility Programs
2
I would like to know what is the best way to get all the mRNA Refseq accession number related to a given Entrez GeneID using NCBI EUtility programs.
For instance I would like to get all the mRNA Refseq accession number (NM_001014431, NM_005163, NM_001014432) related the gene AKT1 (Entrez GeneiD = 207).
I know that using the url below I get and XML file where are the Refseq accession number are embedded but I think they are very difficult to extract using XSLT.
So if someone have a better url to provide it would be very helpful
http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml
eutils
refseq
entrez
ncbi
• 7.3k views
Are you committed to using XSLT? I would think about an XPath query - most languages provide this functionality in their XML libraries.
For example, using R:
library(RCurl)
library(XML)
ef <- xmlTreeParse(getURL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml "), useInternalNodes = T)
ns <- getNodeSet(ef, "//Gene-commentary_accession")
accn <- sapply(ns, function(x) { xmlValue(x) } )
# get the NM_
accn[grep("NM_", unique(accn))]
[1] "NM_001014431" "NM_001014432" "NM_005163"
•
link
updated 5.3 years ago by
Ram
44k
•
written 14.2 years ago by
Neilfws
49k
the following xslt stylesheet does it as far as I tested it:
<xsl:stylesheet xmlns:xsl="<a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
version='1.0'
>
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates select="Entrezgene-Set"/>
</xsl:template>
<xsl:template match="Entrezgene-Set">
<xsl:apply-templates select="Entrezgene"/>
</xsl:template>
<xsl:template match="Entrezgene">
id: <xsl:value-of select="Entrezgene_track-info/Gene-track/Gene-track_geneid"/>
locus: <xsl:value-of select="Entrezgene_gene/Gene-ref/Gene-ref_locus"/>
[
<xsl:apply-templates select=".//Gene-commentary[Gene-commentary_heading='NCBI Reference Sequences (RefSeq)']//Gene-commentary_products/Gene-commentary" mode="product"/>
]
</xsl:template>
<xsl:template match="Gene-commentary" mode="product">
type: <xsl:value-of select="Gene-commentary_type/@value"/>
acn:<xsl:value-of select="Gene-commentary_accession"/>
</xsl:template>
</xsl:stylesheet>
Usage:
xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml "
id: 207
locus: AKT1
[
type: mRNA
acn:NM_001014431
type: peptide
acn:NP_001014431
type: mRNA
acn:NM_001014432
type: peptide
acn:NP_001014432
type: mRNA
acn:NM_005163
type: peptide
acn:NP_005154
type: genomic
acn:NC_000014
type: genomic
acn:NT_026437
type: genomic
acn:AC_000057
type: genomic
acn:NW_925561
type: genomic
acn:AC_000146
type: genomic
acn:NW_001838116
]
Login before adding your answer.
Traffic: 2572 users visited in the last hour
You don't need a regex. Just use the xpath expression
starts-with
inActually yes I am using XSLT so Pierre's solution fit my needs better. But since I want only the NM_ or XM_ acc number I should may be introduce regular expression in Pierre's solution.
@ Pierre : thanks for the tips but when I try it I get an errror message. Does it work if you include it in the code of your response ?
@ Pierre : I fxed it replacing double quote by single quote : 'MN_' instead of "NM_"