Question

Printing Out Hash (Dictionary) Using Hash Class Python

0

Entering edit mode

10.6 years ago

st.ph.n ★ 2.7k

I have an assignment to create a hash class in python, to implement on a text file containing restriction enzymes, with sequences in the first column and the RE name in the second. The goal is to create a random 6bp sequence, and see if it exists in the text file by printing out the corresponding enzyme name. I can do this with just lists, and turning it into a dictionary, and using get() to retrieve the corresponding enzyme name, however I must create a class.

Here is the hash class, and random sequence generator (there may be indentation errors when pasting):

> class KeyValue:
>     def __init__(self,key,value):
>         self.key=key
>         self.value=value
>     def __str__(self):
>         return str(self.key)+":"+str(self.value) class HashTable:
> 
>     def __init__(self, SIZE):
>         i=0
>         self.list=[]
>         for i in range(SIZE):
>             self.list.append([])
>             i=i+1
>         self.SIZE = len(self.list)
> 
>     def getValue(self,key):
>         h = self.hash(key)
>         bucket = self.list[h]
>         for kv in bucket:
>             if kv.key==key:
>                 return kv.value
>     def setValue(self,key,value):
>         h = self.hash(key)
>         # should search first so we don't put key in twice, but for now ignore
>         self.list[h].append(KeyValue(key,value))
> 
>     def hash(self, key):
>         i=0
>         total=0
>         while i<len(key):
>             total = total+ord(key[i])
>             i=i+1
>         return total % self.SIZE
> 
> def random_DNA(length):
>     return ''.join(random.choice('ATCG') for _ in xrange(length))

Here is the code, used to import the module:

from HashTable import *

fh = open("restriction_enzymes.txt", "r")

num_lines = int(sum(1 for line in fh))

print num_lines

hashtable = HashTable(int(num_lines))

for line in fh:
    (key, value) = line.strip().split('t')
    hashtable.setValue(key, value)

print hashtable

DNA = random_DNA(6)
print DNA
print hastable.getValue(DNA)
print hashtable.getValue('AACGTT')

First, I want to be able to print out the hash table. I want to be able to visualize the dictionary. When attempting to do that with the line " print hashtable" after it is created, I get the output: "HashTable.HashTable instance at 0x7ff53555db90", which I guess is the location in memory? Do I have to implement a str or repr function in the HashTable class? Can someone assist with this?

Regarding the main output of the program, if I put the variable DNA, the random seq, in for "print hashtable.getValue(DNA)", I get the output of "None". Ok, so that random seq isn't in there. So, I tried copying and pasting the first seq from the text file 'AACGTT' which corresponds to the RE AclI. However, I still get the output "None". Anyone have any ideas what I'm doing wrong here?

All the seq identifiers in the text file are either 4, 6, or 8 bp long. Since the random sequence is 6, does using the hash function know to pull it if it's 4 or 8 and the 6 bp seq matches somewhere in the 4 or 8? Or should I implement a function to try and pull it based on nucleotide matches?

All help is appreciated. Best.

python homework • 5.3k views

ADD COMMENT • link updated 10.6 years ago by Matt Shirley 10k • written 10.6 years ago by st.ph.n ★ 2.7k

0

Entering edit mode

is reinventing the python dictionary part of the homework assignment? Seems like you could use a dictionary comprehension, something like {k:v for k,v in [L.strip().split("\t") for L in fh]}

You would still need to build a class around the python dict, including a __str__ method, and another for exracting the "partial" matches you want to report.

ADD REPLY • link 10.6 years ago by David W 4.9k

score 1 · Answer 1 · 2014-04-10

I don't think you need to re-invent Python's dictionary class. Why not:

from json import dumps
fh = open("restriction_enzymes.txt", "r")

enzyme_sites = dict()

for line in fh:
  seq, name = line.rstrip().split()
  enzyme_sites[seq] = name

# here is a nice way to print our dictionary
print(dumps(enzyme_sites, indent=4))

DNA = random_DNA(6)
print(DNA)
print(enzyme_sites[DNA])
print(enzyme_sites['AACGTT'])
fh.close()

the json module has a dumps function that will format your dictionary nicely or printing.