Rearanging Data Using Dictionary
2
1
Entering edit mode
14.0 years ago
Samihk ▴ 20

I have a dataset and I extracted three columns contain: pdb name and chain 1 and chain 2.

I want to write a file with pdb , chains related 2 it but remove redundant combinations

I tried to use both a list and dictionary for the same purpose,

I wanted something like;

ABC A B
ABC c D

and I am getting,

ABC A B
ABC C D
ABC B A

since A, B and B, A are the same combination I want to remove the redundancy...

heres a part of my code..

pdb_name  = cols[4].strip(string.punctuation)

    chain_1= cols[5].strip(string.punctuation)
    chain_2 = cols[10].strip(string.punctuation)
    if a.has_key(pdb_name):
        a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
        a[pdb_name][chain_1][chain_2] =   a[pdb_name][chain_1].get(chain_2, +1)
    else:
        a[pdb_name]= a.get(pdb_name, {})
        a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
        a[pdb_name][chain_1][chain_2] =   a[pdb_name][chain_1].get(chain_2, +1)
   #     a[pdb_name][chain_1].append(chain_2)
     #   a[pdb_name] = [chain_1]
      #  a[pdb_name].append(chain_2)

###    if a.key() in list:
#        chain_1

        # if there is a second combination why its not coming ??


outfile_f = open('pdbextract.txt', 'w')    
aa = a.keys()
aa.sort()

for l in aa:

    te = a[l].keys()
    te.sort()
    for m in te:
       tf = a[l][m].keys()
       tf.sort()
       for n in tf:
           l = l.strip(string.punctuation)
           m = m.strip(string.punctuation)
           n = n.strip(string.punctuation)
           outfile_f.write("%-4s%-1s%-1s%-1s%-1s\n" % ( l, '\t ', m, '\t ', n))
python programming • 2.4k views
ADD COMMENT
5
Entering edit mode
14.0 years ago
brentp 24k

If I understand correctly, and you sort the keys, you'll get the same for A, B and B, A. And if you do a bit of initialization with the python collections.defaultdict, you can make the code a lot more readable. Something like this should work

# initialize a to take form a['pdb']['A']['B'] += 1 
a = collections.defaultdict(lambda \
     :collections.defaultdict(lambda: collections.defaultdict(int)))

# later
pdb_name  = cols[4].strip(string.punctuation)

chain_1= cols[5].strip(string.punctuation)
chain_2 = cols[10].strip(string.punctuation)

# so B, A is sorted to A, B
chain_a, chain_b = sorted((chain_1, chain_2))

a[pdb_name][chain_a][chain_b] += 1

Once that's done, the logic to do the printing is much simpler.

ADD COMMENT
0
Entering edit mode

i just love defaultdict ... I don't think I go any project without using it at least once

ADD REPLY
2
Entering edit mode
14.0 years ago
Jashapiro ▴ 230

Why not just check that chain_1 <= chain_2 and if not, switch them before adding to your dict.

You can eliminate a bit of redundancy by using dict.setdefault() like so:

a.setdefault(pdb_name, {})
a[pdb_name].setdefault(chain_1, {})
a[pdb_name][chain_1].setdefault(chain_2, 1)

Also, you are stripping punctuation twice (when you read it in and before writing)...

And in your last line, is there any reason not to leave the tabs in the write string, rather than substituting them in?

outfile_f.write("%-4s\t%-1s\t%-1s\n" % ( l, m, n))
ADD COMMENT
0
Entering edit mode

Thanks a lot, I am a newbie so I dont know many rules or easy solution!

ADD REPLY

Login before adding your answer.

Traffic: 2199 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6