I have a dataset and I extracted three columns contain: pdb name and chain 1 and chain 2.
I want to write a file with pdb , chains related 2 it but remove redundant combinations
I tried to use both a list and dictionary for the same purpose,
I wanted something like;
ABC A B
ABC c D
and I am getting,
ABC A B
ABC C D
ABC B A
since A, B and B, A are the same combination I want to remove the redundancy...
heres a part of my code..
pdb_name = cols[4].strip(string.punctuation)
chain_1= cols[5].strip(string.punctuation)
chain_2 = cols[10].strip(string.punctuation)
if a.has_key(pdb_name):
a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
a[pdb_name][chain_1][chain_2] = a[pdb_name][chain_1].get(chain_2, +1)
else:
a[pdb_name]= a.get(pdb_name, {})
a[pdb_name][chain_1] = a[pdb_name].get(chain_1, {})
a[pdb_name][chain_1][chain_2] = a[pdb_name][chain_1].get(chain_2, +1)
# a[pdb_name][chain_1].append(chain_2)
# a[pdb_name] = [chain_1]
# a[pdb_name].append(chain_2)
### if a.key() in list:
# chain_1
# if there is a second combination why its not coming ??
outfile_f = open('pdbextract.txt', 'w')
aa = a.keys()
aa.sort()
for l in aa:
te = a[l].keys()
te.sort()
for m in te:
tf = a[l][m].keys()
tf.sort()
for n in tf:
l = l.strip(string.punctuation)
m = m.strip(string.punctuation)
n = n.strip(string.punctuation)
outfile_f.write("%-4s%-1s%-1s%-1s%-1s\n" % ( l, '\t ', m, '\t ', n))
i just love
defaultdict
... I don't think I go any project without using it at least once