I have a Swiss-Prot database file that contains several Swiss-Prot Files.
They are copied and pasted underneath each other.
Therefore there is one Swiss-Prot entry after another listed in the same file.
I want to write the ID into another file as the header. Immediately underneath, I want to write the amino acid sequence.
So far I can only read one single Swiss-Prot file and get as output 1ID and 1 amino acid sequence. In other words, I have managed to print out the ID header first and the amino acid sequence second .
How can this code work to read multiple Swiss-Prot file entries from one single file?
How do I do this sequentially for every ID and amino acid sequence from each Swiss-Prot entry listed in the file?
bright_cyan = "\033[0;96m"
bright_yellow = "\033[0;33m"
bright_green = "\033[0;32m"
reset = "\033[0m"
#--------------------------------------------------------------------
import sys
import re
#--------------------------------------------------------------------
def read_data(SPROT_FILE):
''' This function is what is is aint it '''
flag = ''
try:
DNAfile = open(SPROT_FILE , 'r')
except IOError as error:
print(bright_cyan + "double check and see if you entered the correct filename :> ", str(error))
sys.exit(1)
# create a FASTA file to copy the information to and write.
new_outfile = open("first.fsa", 'w')
amino_acid_sequence = ''
for line in DNAfile:
#print(line, end = '')
if re.match(r'ID', line):
ID = line[5:20]
# Stateful Parsing of the amino acid sequence.
if re.match(r'//', line):
flag = False
if flag:
amino_acid_sequence += line
if re.match(r'SQ', line):
flag = True
# Find the modified amino acid residue.
if re.match(r'FT MOD_RES', line):
FT = line
position_switch = ','.join(re.findall(r'\d+',FT))
header_line = '>'+ID.strip()+" phospho:"+position_switch
print(header_line)
#print('>'+ID.strip()+" phospho:"+position_switch, file = new_outfile)
# Print each amino acid sequence outside of the loop.
amino_acid_sequence = amino_acid_sequence.replace(' ', '')
print(amino_acid_sequence)
# Write the amino acid sequence to the file.
print(amino_acid_sequence, file = new_outfile)
DNAfile.close()
new_outfile.close()
# Not sure about this part...
files = input(bright_yellow + 'Type possibly filenames :> ').split()
for filename in files:
read_data(filename)
I hope the question is clear.
Would be great it if you could offer some help.
Thanks in advance
Can you give an example of what you mean by swiss-prot file? I think you're describing a fasta file with amino acids. In that case use BioPython to parse the fasta file.
Yes. Here is an example of the file. It is a few thousand lines long so I won't put the whole thing.
Hopefully that is clearer now.
Best