Entering edit mode
3.0 years ago
shivam-gupta
•
0
I am working with FASTA files of protein. I want to locate the desired AA sequence in every clone of the protein fasta file using python.
records=SeqIO.parse("protein.fasta", ''fasta'') #to extract protein sequences from FASTA file
for record in records:
output=record.sec
print(output) #just to show how the output looks like.
#I used ** to hightlight the desired area
enter code here
-->VVSREL**QALEA**IRQKDEEDABCKARFRGIFSH
-->VVSRPQREEARJKLMIRQKDEED**KARFRG**IFSH
-->VVSREL**QALEA**RIRDKARFRGIFSH
enter code here
f=open('amino_acids.txt', 'r') #to get the AA sequences from the text file or what is inside the file
for i in f: #to show how this file looks like
print(i)
-->'QALEA', 'KARFRG', 'QALEAR','KAKAKA', 'PAKAR'
#to match my AA sequences with the protein sequences
for i in f:
for j in output:
if i in j:
print('found')
else:
print('not fount')
#output
--> error
--> error
-->error
How to locate the desired AA sequences in the protein fasta file.
Any help will be appreciated.
I have a related script that may help you adapt yours as well. It's called
find_sequence_element_occurrences_in_sequence.py
and the information about it is here, including a link to a demo Jupyter notebook and how you can run it using sessions served by the MyBinder.org. It's not as refined as my more recent script development; however, it may give you some ideas. Also note the description takes about how it was originally written for nucleic acid and so searches what it thinks should be there as another strand that's moot in case of protein and suggests how to fix.Down the road, you may want fuzzier search abilities with pattern matching or using regular expressions for examining sequences, and on the README page of that sequencework/FindSequence subrepo there's some resources and information about that. It links to a whole demo on use of PatMatch that I made. PatMatch is a program for finding patterns in peptide and nucleotide sequences. Plus you may want to incorporate some approaches where there's less for you to maintain and I've got a list of related resources there as well.
Since the original question is about python code I will move this to a comment. While useful this is not directly answering the original question.