Question

How to align disorder region and sequence in a larger scale (python)

0

Entering edit mode

11.0 years ago

Jason Lin • 0

So this is a follow up to my previous question. Thanks to @mdml. My previous question about How to align and compare two elements (sequence) in a list using python have been solved. Here is the code that I'm using (Code credit to mdml):

# Parse the file which was already split into split_list
lines = open("seq.txt")
for list in lines:
split_list = list.split()
header = "".join(split_list[0:2])
seq = split_list[2]
disorder = split_list[4]

# Create the new disorder string
new_disorder = ["Disorder: Posi R"]
for i, x in enumerate(disorder):
if x == "X":
    # Appends of the form: "AminoAcid Position"
    new_disorder.append("{} {}".format(i, seq[i]))

new_disorder = " ".join(new_disorder)

# Output the modified file
open("seq2.txt", "w").write( "\n".join([header, seq, new_disorder]))

This code work perfectly with my example which is:

103L Sequence: MNIFEMLRIDEGLRLKIYKDTEGYYTIGIGHLLTKSPSLNSLDAAKSELDKAIGRNTNGVITKDEAEKLFNQDVDAAVRGILRNAKLKPVYDSLDAVRRAALINMVFQMGETGVAGFTNSLRMLQQKRWDEAAVNLAKSRWYNQTPNRAKRVITTFRTGTWDAYKNL Disorder: ----------------------------------XXXXXX-----------------------------------------------------------------------------------------------------------------------------XX

However when I use this code for multiple protein sequence. It still work, but only last protein sequence and it's disordered region showed up in the new file. What should I do to fix it?

protein-sequence python • 2.3k views

ADD COMMENT • link updated 2.4 years ago by Ram 45k • written 11.0 years ago by Jason Lin • 0

Ram · Accepted Answer · 2014-07-10

The problem you had is that you are opening the "seq2.txt" file to write each time it passes over a new line. Simply move the whole code a couple of indents and it should work.

Try:

# Parse the file which was already split into split_list
with open("seq.txt", "r") as lines:
    with open("seq2.txt", "w") as output:
        for list in lines:
            split_list = list.split()
            header = "".join(split_list[0:2])
            seq = split_list[2]
            disorder = split_list[4]
            # Create the new disorder string
            new_disorder = ["Disorder:\nPosi\tR"]
            for i, x in enumerate(disorder):
                if x == "X":
                    # Appends of the form: "AminoAcid Position"
                    new_disorder.append("{}\t{}".format(i, seq[i]))

            new_disorder = " ".join(new_disorder)

            # Output the modified file
            output.write("\n".join([header, seq, new_disorder])+"\n\n")

It's good practice to close files within the script and to save from having to do so at the end you can use the with open() as ... : at the start.

Hope that helps.