Hi all,
Sorry to bother you all again. so I have a text file which contains the PDBID and corresponding missing coordinates from PDB file. Such as:
1FZ2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
and I have another text file which contains the PDBID and SEG signal (which is the signal indicates to low complexity region in protein sequence). Such as:
1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354
The numbers in each files are coordinates. so I want to compare those two files and generate a file which contains PDBID or course and corresponding overlap coordinates between SEG signal and missing coordinates.
In this case I want to generate a file like:
1FZ2 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ4 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ5 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ8 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZ9 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
1FZH 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
I have my python code so far:
total = []
fin = open('file1.txt') # I want to make the missing coordinates file a set called 'a'
for lines in fin:
l = lines.split()
a = set(l[2:])
print a
with open('file2.txt') as seg_num: # I want to make the SEG signal another set called 'b'
for seg_signal in seg_num:
signal = seg_signal.split()
b = set(signal[1:])
print("lol" * 10)
print b
c = a & b # and pick the intersection between a and b called c
space = ' '
newlines = '\n'
total.append([signal[0], space, str(c), newlines])
with open('file3.txt', 'w') as f:
for t in total:
f.write(" ".join(t))
f.close()
But for some reason it did not give the desired answer. And I don't know how to fix it.