Easiest Way To Get Mrna Refseq Acc Related To An Entrez Gene Id Using Ncbi Eutility Programs
2
5
Entering edit mode
14.2 years ago

I would like to know what is the best way to get all the mRNA Refseq accession number related to a given Entrez GeneID using NCBI EUtility programs.

For instance I would like to get all the mRNA Refseq accession number (NM_001014431, NM_005163, NM_001014432) related the gene AKT1 (Entrez GeneiD = 207).

I know that using the url below I get and XML file where are the Refseq accession number are embedded but I think they are very difficult to extract using XSLT.

So if someone have a better url to provide it would be very helpful

http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml

eutils refseq entrez ncbi • 7.3k views
ADD COMMENT
7
Entering edit mode
14.2 years ago
Neilfws 49k

Are you committed to using XSLT? I would think about an XPath query - most languages provide this functionality in their XML libraries.

For example, using R:

library(RCurl)
library(XML)

ef   <- xmlTreeParse(getURL("http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml"), useInternalNodes = T)
ns   <- getNodeSet(ef, "//Gene-commentary_accession")
accn <- sapply(ns, function(x) { xmlValue(x) } )
# get the NM_
accn[grep("NM_", unique(accn))]

[1] "NM_001014431" "NM_001014432" "NM_005163"
ADD COMMENT
1
Entering edit mode

You don't need a regex. Just use the xpath expression starts-with in

select=".//Gene-commentary[Gene-commentary_heading='NCBI Reference Sequences (RefSeq)']//Gene-commentary_products/Gene-commentary[starts-with(Gene-commentary_accession,"NM_")]
ADD REPLY
0
Entering edit mode

Actually yes I am using XSLT so Pierre's solution fit my needs better. But since I want only the NM_ or XM_ acc number I should may be introduce regular expression in Pierre's solution.

ADD REPLY
0
Entering edit mode

@ Pierre : thanks for the tips but when I try it I get an errror message. Does it work if you include it in the code of your response ?

ADD REPLY
0
Entering edit mode

@ Pierre : I fxed it replacing double quote by single quote : 'MN_' instead of "NM_"

ADD REPLY
5
Entering edit mode
14.2 years ago

the following xslt stylesheet does it as far as I tested it:


<xsl:stylesheet xmlns:xsl="&lt;a href=" <a="" href="http://www.w3.org/1999/XSL/Transform" rel="nofollow">http://www.w3.org/1999/XSL/Transform" "="" rel="nofollow">http://www.w3.org/1999/XSL/Transform'
    version='1.0'
    >

<xsl:output method="text"/>

<xsl:template match="/">
<xsl:apply-templates select="Entrezgene-Set"/>
</xsl:template>

<xsl:template match="Entrezgene-Set">
<xsl:apply-templates select="Entrezgene"/>
</xsl:template>

<xsl:template match="Entrezgene">
id: <xsl:value-of select="Entrezgene_track-info/Gene-track/Gene-track_geneid"/>
locus: <xsl:value-of select="Entrezgene_gene/Gene-ref/Gene-ref_locus"/>
[
<xsl:apply-templates select=".//Gene-commentary[Gene-commentary_heading='NCBI Reference Sequences (RefSeq)']//Gene-commentary_products/Gene-commentary" mode="product"/>
]
</xsl:template>

<xsl:template match="Gene-commentary" mode="product">
type: <xsl:value-of select="Gene-commentary_type/@value"/>
acn:<xsl:value-of select="Gene-commentary_accession"/>

</xsl:template>

</xsl:stylesheet>

Usage:

xsltproc --novalid jeter.xsl "http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=gene&id=207&retmode=xml"

id: 207
locus: AKT1
[

type: mRNA
acn:NM_001014431
type: peptide
acn:NP_001014431
type: mRNA
acn:NM_001014432
type: peptide
acn:NP_001014432
type: mRNA
acn:NM_005163
type: peptide
acn:NP_005154
type: genomic
acn:NC_000014
type: genomic
acn:NT_026437
type: genomic
acn:AC_000057
type: genomic
acn:NW_925561
type: genomic
acn:AC_000146
type: genomic
acn:NW_001838116

]
ADD COMMENT

Login before adding your answer.

Traffic: 2465 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6