I have an assignment to create a hash class in python, to implement on a text file containing restriction enzymes, with sequences in the first column and the RE name in the second. The goal is to create a random 6bp sequence, and see if it exists in the text file by printing out the corresponding enzyme name. I can do this with just lists, and turning it into a dictionary, and using get() to retrieve the corresponding enzyme name, however I must create a class.
Here is the hash class, and random sequence generator (there may be indentation errors when pasting):
> class KeyValue:
> def __init__(self,key,value):
> self.key=key
> self.value=value
> def __str__(self):
> return str(self.key)+":"+str(self.value) class HashTable:
>
> def __init__(self, SIZE):
> i=0
> self.list=[]
> for i in range(SIZE):
> self.list.append([])
> i=i+1
> self.SIZE = len(self.list)
>
> def getValue(self,key):
> h = self.hash(key)
> bucket = self.list[h]
> for kv in bucket:
> if kv.key==key:
> return kv.value
> def setValue(self,key,value):
> h = self.hash(key)
> # should search first so we don't put key in twice, but for now ignore
> self.list[h].append(KeyValue(key,value))
>
> def hash(self, key):
> i=0
> total=0
> while i<len(key):
> total = total+ord(key[i])
> i=i+1
> return total % self.SIZE
>
> def random_DNA(length):
> return ''.join(random.choice('ATCG') for _ in xrange(length))
Here is the code, used to import the module:
from HashTable import * fh = open("restriction_enzymes.txt", "r") num_lines = int(sum(1 for line in fh)) print num_lines hashtable = HashTable(int(num_lines)) for line in fh: (key, value) = line.strip().split('t') hashtable.setValue(key, value) print hashtable DNA = random_DNA(6) print DNA print hastable.getValue(DNA) print hashtable.getValue('AACGTT')
First, I want to be able to print out the hash table. I want to be able to visualize the dictionary. When attempting to do that with the line " print hashtable" after it is created, I get the output: "HashTable.HashTable instance at 0x7ff53555db90", which I guess is the location in memory? Do I have to implement a str or repr function in the HashTable class? Can someone assist with this?
Regarding the main output of the program, if I put the variable DNA, the random seq, in for "print hashtable.getValue(DNA)", I get the output of "None". Ok, so that random seq isn't in there. So, I tried copying and pasting the first seq from the text file 'AACGTT' which corresponds to the RE AclI. However, I still get the output "None". Anyone have any ideas what I'm doing wrong here?
All the seq identifiers in the text file are either 4, 6, or 8 bp long. Since the random sequence is 6, does using the hash function know to pull it if it's 4 or 8 and the 6 bp seq matches somewhere in the 4 or 8? Or should I implement a function to try and pull it based on nucleotide matches?
All help is appreciated. Best.
is reinventing the python dictionary part of the homework assignment? Seems like you could use a dictionary comprehension, something like
{k:v for k,v in [L.strip().split("\t") for L in fh]}
You would still need to build a class around the python dict, including a __str__ method, and another for exracting the "partial" matches you want to report.