Question

Rearanging Data Using Dictionary

1

Entering edit mode

14.7 years ago

Samihk ▴ 20

I have a dataset and I extracted three columns contain: pdb name and chain 1 and chain 2.

I want to write a file with pdb , chains related 2 it but remove redundant combinations

I tried to use both a list and dictionary for the same purpose,

I wanted something like;

ABC A B
ABC c D

and I am getting,

ABC A B
ABC C D
ABC B A

since A, B and B, A are the same combination I want to remove the redundancy...

heres a part of my code..

pdb_name  = cols[4].strip(string.punctuation)

    chain_1= cols[5].strip(string.punctuation)
    chain_2 = cols[10].strip(string.punctuation)
    if a.has_key(pdb_name):
        a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
        a[pdb_name][chain_1][chain_2] =   a[pdb_name][chain_1].get(chain_2, +1)
    else:
        a[pdb_name]= a.get(pdb_name, {})
        a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
        a[pdb_name][chain_1][chain_2] =   a[pdb_name][chain_1].get(chain_2, +1)
   #     a[pdb_name][chain_1].append(chain_2)
     #   a[pdb_name] = [chain_1]
      #  a[pdb_name].append(chain_2)

###    if a.key() in list:
#        chain_1

        # if there is a second combination why its not coming ??


outfile_f = open('pdbextract.txt', 'w')    
aa = a.keys()
aa.sort()

for l in aa:

    te = a[l].keys()
    te.sort()
    for m in te:
       tf = a[l][m].keys()
       tf.sort()
       for n in tf:
           l = l.strip(string.punctuation)
           m = m.strip(string.punctuation)
           n = n.strip(string.punctuation)
           outfile_f.write("%-4s%-1s%-1s%-1s%-1s\n" % ( l, '\t ', m, '\t ', n))

python programming • 2.9k views

ADD COMMENT • link updated 14.7 years ago by brentp 24k • written 14.7 years ago by Samihk ▴ 20

Ram · Answer 1 · 2010-12-24

If I understand correctly, and you sort the keys, you'll get the same for A, B and B, A. And if you do a bit of initialization with the python collections.defaultdict, you can make the code a lot more readable. Something like this should work

# initialize a to take form a['pdb']['A']['B'] += 1 
a = collections.defaultdict(lambda \
     :collections.defaultdict(lambda: collections.defaultdict(int)))

# later
pdb_name  = cols[4].strip(string.punctuation)

chain_1= cols[5].strip(string.punctuation)
chain_2 = cols[10].strip(string.punctuation)

# so B, A is sorted to A, B
chain_a, chain_b = sorted((chain_1, chain_2))

a[pdb_name][chain_a][chain_b] += 1

Once that's done, the logic to do the printing is much simpler.

Ram · Answer 2 · 2010-12-24

2

Entering edit mode

14.7 years ago

Jashapiro ▴ 230

Why not just check that chain_1 <= chain_2 and if not, switch them before adding to your dict.

You can eliminate a bit of redundancy by using dict.setdefault() like so:

a.setdefault(pdb_name, {})
a[pdb_name].setdefault(chain_1, {})
a[pdb_name][chain_1].setdefault(chain_2, 1)

Also, you are stripping punctuation twice (when you read it in and before writing)...

And in your last line, is there any reason not to leave the tabs in the write string, rather than substituting them in?

outfile_f.write("%-4s\t%-1s\t%-1s\n" % ( l, m, n))

ADD COMMENT • link updated 6.0 years ago by Ram 45k • written 14.7 years ago by Jashapiro ▴ 230

0

Entering edit mode

Thanks a lot, I am a newbie so I dont know many rules or easy solution!

ADD REPLY • link 14.7 years ago by Samihk ▴ 20