I have to find out the size of the protein sequence, but even using the codes below, I couldn't. This first code was to find how many proteins there are in total and to find the size of the sequences.
The attached image is just to show what I want the code to search for. I don't know what is missing in the code
arq = open("genoma9.faa")
conteudo = arq.read()
print(conteudo)
fh = open("genoma9.faa")
n= 0
for line in fh:
if line.startswith(">"):
n+= 1
print(line)
proteins = line.count(">")
print("Total of Proteins: " + str(proteins))
Trying to find this middles characters above the >WP:
Example:
>WP_013277001.1 DNA polymerase III subunit beta [Acetohalobium arabaticum]
MQIKIDRKNFYDGIQTVRKAISSKSTLPILSGILIETQEKKLKLVGTDLELGIECRVDANIIKDGAIVLPANHLANIVRE
LPNKELELELKKDNKIEISCGLSQFKIHGSPADEYPLLPEVGSGIEYTLSQEKFQAMINRIKFATSDDESRPFLTGGLLS
you said. FAA File Sequence
please, do so now.
Answer of the other post:
Have you tried running this piece of code? It looks like it has an indentation error?
this post is the same of your previous one Print the size of a protein . Stop asking new questions and update your original post.
I reposted because I deleted the other one since I didn't post the code in the old post.
The
edit
button is for edits, no need to delete.