Question

convert list to fasta format

0

Entering edit mode

8 months ago

pinheirofabiano ▴ 100

I'm writing the Python script below. I had some issues, but now it's working, my main difficulty now is to save the output (a list) in fasta format. Do I need to convert the list to a panda dataframe before? or there is a more straight forward way to do it? can you help me to find a pythonic answer?

sequence = input("Digit the peptide sequence: ")

peptides_80 = []
for i in range(0, len(sequence)-80):
    peptides_80 += [''.join(sequence[i:i+80])]
ofile = open("subsequences.txt", "w")
for i in range(len(peptides_80)):
    ofile.write(">" + " " +  "subsequence" + [i] + "\n" + peptides_80[i] + "\n")
ofile.close()

python • 1.5k views

ADD COMMENT • link updated 8 months ago by Ram 45k • written 8 months ago by pinheirofabiano ▴ 100

1

Entering edit mode

First off, what on earth is peptide(sequence):? Are you missing a def there? Also, why reinvent the wheel instead of simply using BioPython?

ADD REPLY • link 8 months ago by Ram 45k

0

Entering edit mode

many thanks, Ram

ADD REPLY • link 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

I formatted your code a bit and I have a quick question: Is the print statement supposed to be part of the loop? If so, please indent it 4 more spaces.

ADD REPLY • link 8 months ago by Ram 45k

0

Entering edit mode

no, print() is outside..

Ram, I've completed the code.. can you check it now?

ADD REPLY • link 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

Check your code, please.

ADD REPLY • link 8 months ago by Ram 45k

0

Entering edit mode

not working yet.. txt file comes blank..

ADD REPLY • link 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

Try printing everything to screen first:

print(len(peptides_80)) #before the second loop
print(">" + "subsequence" + [i] + "\n" + peptides_80[i] + "\n") #inside the loop
# Also exit when i > 4 so you don't print a TON of stuff

ADD REPLY • link 8 months ago by Ram 45k

0

Entering edit mode

duplicate of your previous question convert list to fasta format . (Why did you delete it ? just update your original post)

ADD REPLY • link 8 months ago by Pierre Lindenbaum 165k

0

Entering edit mode

many thanks.. yes, the post was getting confusing, I decide to rewrite it.. almost there, but not working yet.. the txt file is blank.. I believe the second for loop is not working..

ADD REPLY • link 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

done, working now! many thanks, Ram and Pierre Lindenbaum but only for long sequences, length > 80

ADD REPLY • link 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

What did you do to get it working? Please add the final working code as an answer.

ADD REPLY • link 8 months ago by Ram 45k

0

Entering edit mode

just changed [i] for str(i) in the code above..it's made to work with long sequences.. length > 80. this script splits peptide or nucleic acids sequences with length greater than 80 in sub-sequences of length == 80. Next, it converts the output to FASTA and save it. it's useful, for example, if you want to investigate biological properties of a protein fragment.. to investigate, for example, protein binding sites.

ADD REPLY • link updated 8 months ago by Ram 45k • written 8 months ago by pinheirofabiano ▴ 100

Ram · Accepted Answer · 2024-07-10

1

Entering edit mode

8 months ago

pinheirofabiano ▴ 100

This works:

sequence = input("Digit the peptide sequence: ")

peptides_80 = []
for i in range(0, len(sequence)-80):
    peptides_80 += [''.join(sequence[i:i+80])]
ofile = open("subsequences.txt", "w")
for i in range(len(peptides_80)):
    ofile.write(">" + " " +  "subsequence" + str(i) + "\n" + peptides_80[i] + "\n")
ofile.close()

ADD COMMENT • link updated 8 months ago by Ram 45k • written 8 months ago by pinheirofabiano ▴ 100

0

Entering edit mode

Wonderful. Please accept this answer to mark the post as solved.

ADD REPLY • link 8 months ago by Ram 45k