Question

Using Biopython Entrez With Gene Name To Get Fastq

0

Entering edit mode

12.0 years ago

StudentOfScience • 0

This is what I want to do. I have a list of gene names for example: [ITGB1, RELA, NFKBIA] Looking up the help in biopython and tutorial for API for entrez I came up with this:

x = ['ITGB1', 'RELA', 'NFKBIA']
for item in x:
    handle = Entrez.efetch(db="nucleotide", id=item ,rettype="gb")
    record = handle.read()
    out_handle = open('genes/'+item+'.xml', 'w') #to create a file with gene name
    out_handle.write(record)
    out_handle.close

But this keeps erroring out. I have discovered that if the id is a numerical id (although you have to make it in to a string to use, '186972394' so:

handle = Entrez.efetch(db="nucleotide", id='186972394' ,rettype="gb")

This gets me the info I want which includes the sequence.

So now to the Question:

How can I search gene names (cause I do not have id numbers) or easily convert my gene names to ids to get the sequences for the gene list I have.

I have also tried:

x = 'RELA'

handle = Entrez.efetch(db="nucleotide", id=x ,rettype="gb")

errors our HTTP Error 400: Bad Request because it is expecting a string of a number for id

handle = Entrez.esearch(db="nucleotide",term=x)

returns nothing,.. debugging shows it did not find anything

handle =  Entrez.esearch(db="nucleotide",term="Homo[Orgn] AND RELA[Gene]")

returns a list of IDS, and the first one is what I want but if I do this, I am sure you will not like it because it does not guarantee that the ID is actually what I want and not just the first of IDs in a list every time I query the search engine.

Thx

entrez biopython python fastq • 5.9k views

ADD COMMENT • link updated 12.0 years ago by Istvan Albert 102k • written 12.0 years ago by StudentOfScience • 0

2

Entering edit mode

This is really an entrez question rather than a Biopython one - you're trying to find an entrez term that limits you to a particular record for each id. Check out these tips for getting only sequences in refseq, and use biomol_genomic[PROP] to get rid of mRNAs

ADD REPLY • link 12.0 years ago by David W 4.9k

1

Entering edit mode

FASTQ ? are you sure you want to get a FASTQ ?

ADD REPLY • link 12.0 years ago by Pierre Lindenbaum 164k

score 0 · Answer 1 · 2012-11-26

Since you seem to be looking for human gene names I think your best bet will be to query HGNC and extract the gene ids from their output.

A tutorial can be found here

I think Biomart will also permit you to connect a human gene name to an id. (I don't have experience with but other Biomart related post on this site might be helpful)

Almost forgot to mention Using the biomart perl api for simple queries