This is what I want to do. I have a list of gene names for example: [ITGB1, RELA, NFKBIA] Looking up the help in biopython and tutorial for API for entrez I came up with this:
x = ['ITGB1', 'RELA', 'NFKBIA']
for item in x:
handle = Entrez.efetch(db="nucleotide", id=item ,rettype="gb")
record = handle.read()
out_handle = open('genes/'+item+'.xml', 'w') #to create a file with gene name
out_handle.write(record)
out_handle.close
But this keeps erroring out. I have discovered that if the id is a numerical id (although you have to make it in to a string to use, '186972394' so:
handle = Entrez.efetch(db="nucleotide", id='186972394' ,rettype="gb")
This gets me the info I want which includes the sequence.
So now to the Question:
How can I search gene names (cause I do not have id numbers) or easily convert my gene names to ids to get the sequences for the gene list I have.
I have also tried:
x = 'RELA'
handle = Entrez.efetch(db="nucleotide", id=x ,rettype="gb")
errors our HTTP Error 400: Bad Request because it is expecting a string of a number for id
handle = Entrez.esearch(db="nucleotide",term=x)
returns nothing,.. debugging shows it did not find anything
handle = Entrez.esearch(db="nucleotide",term="Homo[Orgn] AND RELA[Gene]")
returns a list of IDS, and the first one is what I want but if I do this, I am sure you will not like it because it does not guarantee that the ID is actually what I want and not just the first of IDs in a list every time I query the search engine.
Thx
This is really an entrez question rather than a Biopython one - you're trying to find an entrez term that limits you to a particular record for each id. Check out these tips for getting only sequences in refseq, and use
biomol_genomic[PROP]
to get rid of mRNAsFASTQ ? are you sure you want to get a FASTQ ?