Getting genes id from taxonomy
1
0
Entering edit mode
6.7 years ago
theLegend • 0

Hello,

Is there an easy way to get all gene ids from a species using Biopython ?

Thanks for your help

taxonomy gene entrez • 1.4k views
ADD COMMENT
0
Entering edit mode

You can use BioPython module Bio.Entrez for this. Alternatively, you can also use NCBI Entrez Direct UNIX E-utilities for the same.

ADD REPLY
5
Entering edit mode
6.7 years ago

BioPython code:

from Bio import Entrez

Entrez.email = "A.N.Other@example.com" 
handle = Entrez.esearch(
    db="gene", 
    term="Homo sapiens[Organism]",
    retmax=100000)
record = Entrez.read(handle)

with open('results.txt', 'w') as oh:
    for id in record["IdList"]:
        oh.write(id + '\n')

Output (first 5 lines of results.txt):

7157
1956
7124
348
7422

You can also use BioMart RESTful access. You just need a XML file with search parameters:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query  virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >

    <Dataset name = "hsapiens_gene_ensembl" interface = "default" >
        <Attribute name = "ensembl_gene_id" />
        <Attribute name = "entrezgene" />
        <Attribute name = "hgnc_symbol" />
    </Dataset>
</Query>

Convert it into a single line and put it after the "query=" below. Run the following command in shell:

wget -O results.txt 'http://www.ensembl.org/biomart/martservice?query=<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="hsapiens_gene_ensembl" interface="default"><Attribute name="ensembl_gene_id"/><Attribute name="entrezgene"/><Attribute name="hgnc_symbol"/></Dataset></Query>'

Output (first 5 lines of results.txt):

ENSG00000210049     MT-TF
ENSG00000211459 4549    MT-RNR1
ENSG00000210077     MT-TV
ENSG00000210082 4550    MT-RNR2
ENSG00000209082     MT-TL1
ADD COMMENT

Login before adding your answer.

Traffic: 1651 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6