Entering edit mode
6.7 years ago
theLegend
•
0
Hello,
Is there an easy way to get all gene ids from a species using Biopython ?
Thanks for your help
Hello,
Is there an easy way to get all gene ids from a species using Biopython ?
Thanks for your help
BioPython code:
from Bio import Entrez
Entrez.email = "A.N.Other@example.com"
handle = Entrez.esearch(
db="gene",
term="Homo sapiens[Organism]",
retmax=100000)
record = Entrez.read(handle)
with open('results.txt', 'w') as oh:
for id in record["IdList"]:
oh.write(id + '\n')
Output (first 5 lines of results.txt
):
7157
1956
7124
348
7422
You can also use BioMart RESTful access. You just need a XML file with search parameters:
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE Query>
<Query virtualSchemaName = "default" formatter = "TSV" header = "0" uniqueRows = "0" count = "" datasetConfigVersion = "0.6" >
<Dataset name = "hsapiens_gene_ensembl" interface = "default" >
<Attribute name = "ensembl_gene_id" />
<Attribute name = "entrezgene" />
<Attribute name = "hgnc_symbol" />
</Dataset>
</Query>
Convert it into a single line and put it after the "query=" below. Run the following command in shell:
wget -O results.txt 'http://www.ensembl.org/biomart/martservice?query=<Query virtualSchemaName="default" formatter="TSV" header="0" uniqueRows="0" count="" datasetConfigVersion="0.6"><Dataset name="hsapiens_gene_ensembl" interface="default"><Attribute name="ensembl_gene_id"/><Attribute name="entrezgene"/><Attribute name="hgnc_symbol"/></Dataset></Query>'
Output (first 5 lines of results.txt
):
ENSG00000210049 MT-TF
ENSG00000211459 4549 MT-RNR1
ENSG00000210077 MT-TV
ENSG00000210082 4550 MT-RNR2
ENSG00000209082 MT-TL1
Use of this site constitutes acceptance of our User Agreement and Privacy Policy.
You can use BioPython module Bio.Entrez for this. Alternatively, you can also use NCBI Entrez Direct UNIX E-utilities for the same.