how to compare sets using python (dealing with PDB file)
1
0
Entering edit mode
10.3 years ago
Jason Lin • 0

Hi all,

Sorry to bother you all again. so I have a text file which contains the PDBID and corresponding missing coordinates from PDB file. Such as:

1FZ2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

and I have another text file which contains the PDBID and SEG signal (which is the signal indicates to low complexity region in protein sequence). Such as:

1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21  339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354

The numbers in each files are coordinates. so I want to compare those two files and generate a file which contains PDBID or course and corresponding overlap coordinates between SEG signal and missing coordinates.

In this case I want to generate a file like:

1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

I have my python code so far:

    total = []

    fin = open('file1.txt')      # I want to make the missing coordinates file a set called 'a'
    for lines in fin:
        l = lines.split()
        a = set(l[2:])
        print a

    with open('file2.txt') as seg_num:     #  I want to make the SEG signal another set called 'b'
        for seg_signal in seg_num:
            signal = seg_signal.split()
            b = set(signal[1:])
            print("lol" * 10)
            print b
            c = a & b                       # and pick the intersection between a and b called c
            space = ' '
            newlines = '\n'

            total.append([signal[0], space, str(c), newlines])

    with open('file3.txt', 'w') as f:
        for t in total:
            f.write(" ".join(t))

    f.close()

But for some reason it did not give the desired answer. And I don't know how to fix it.

PDB python set SEG • 3.5k views
ADD COMMENT
2
Entering edit mode
10.3 years ago

That's how I would do it. IN_PDB file is read in memory as dictionary so the first column is a unique identifier. The common coordinates are found with the list comprehension [x for x in pdb[k] if x in coords]:

#!/usr/bin/env python

IN_PDB= 'pdb.txt'
IN_SEG= 'seg.txt'
OUT_PDB= 'outpdb.txt'

inpdb= open(IN_PDB)
pdb= {}
for line in inpdb:
    line= line.strip().split()
    pdb[line[0]]= line[1:]
inpdb.close()

outsig= open(OUT_PDB, 'w')
inseg= open(IN_SEG)
for line in inseg:
    line= line.strip().split()
    k= line[0]
    coords= line[1:]
    if k in pdb:
        common= [x for x in pdb[k] if x in coords]
        outsig.write(k + '\t' + '\t'.join(common) + '\n')
outsig.close()
inseg.close()
ADD COMMENT

Login before adding your answer.

Traffic: 953 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6