convert list to fasta format
1
0
Entering edit mode
4 months ago

I'm writing the Python script below. I had some issues, but now it's working, my main difficulty now is to save the output (a list) in fasta format. Do I need to convert the list to a panda dataframe before? or there is a more straight forward way to do it? can you help me to find a pythonic answer?

sequence = input("Digit the peptide sequence: ")

peptides_80 = []
for i in range(0, len(sequence)-80):
    peptides_80 += [''.join(sequence[i:i+80])]
ofile = open("subsequences.txt", "w")
for i in range(len(peptides_80)):
    ofile.write(">" + " " +  "subsequence" + [i] + "\n" + peptides_80[i] + "\n")
ofile.close()
python • 830 views
ADD COMMENT
1
Entering edit mode

First off, what on earth is peptide(sequence):? Are you missing a def there? Also, why reinvent the wheel instead of simply using BioPython?

ADD REPLY
0
Entering edit mode

many thanks, Ram

ADD REPLY
0
Entering edit mode

I formatted your code a bit and I have a quick question: Is the print statement supposed to be part of the loop? If so, please indent it 4 more spaces.

ADD REPLY
0
Entering edit mode

no, print() is outside..

Ram, I've completed the code.. can you check it now?

ADD REPLY
0
Entering edit mode

Check your code, please.

ADD REPLY
0
Entering edit mode

not working yet.. txt file comes blank..

ADD REPLY
0
Entering edit mode

Try printing everything to screen first:

print(len(peptides_80)) #before the second loop
print(">" + "subsequence" + [i] + "\n" + peptides_80[i] + "\n") #inside the loop
# Also exit when i > 4 so you don't print a TON of stuff
ADD REPLY
0
Entering edit mode

duplicate of your previous question convert list to fasta format . (Why did you delete it ? just update your original post)

ADD REPLY
0
Entering edit mode

many thanks.. yes, the post was getting confusing, I decide to rewrite it.. almost there, but not working yet.. the txt file is blank.. I believe the second for loop is not working..

ADD REPLY
0
Entering edit mode

done, working now! many thanks, Ram and Pierre Lindenbaum but only for long sequences, length > 80

ADD REPLY
0
Entering edit mode

What did you do to get it working? Please add the final working code as an answer.

ADD REPLY
0
Entering edit mode

just changed [i] for str(i) in the code above..it's made to work with long sequences.. length > 80. this script splits peptide or nucleic acids sequences with length greater than 80 in sub-sequences of length == 80. Next, it converts the output to FASTA and save it. it's useful, for example, if you want to investigate biological properties of a protein fragment.. to investigate, for example, protein binding sites.

ADD REPLY
1
Entering edit mode
4 months ago

This works:

sequence = input("Digit the peptide sequence: ")

peptides_80 = []
for i in range(0, len(sequence)-80):
    peptides_80 += [''.join(sequence[i:i+80])]
ofile = open("subsequences.txt", "w")
for i in range(len(peptides_80)):
    ofile.write(">" + " " +  "subsequence" + str(i) + "\n" + peptides_80[i] + "\n")
ofile.close()
ADD COMMENT
0
Entering edit mode

Wonderful. Please accept this answer to mark the post as solved.

ADD REPLY

Login before adding your answer.

Traffic: 1514 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6