Entering edit mode
7.4 years ago
arunprasanna83
▴
60
Hello,
I am trying to find sequences that has a tripeptide. The tripeptide can have any other amino acids following it, except 'P'. I extracted them with the following way.
from Bio import SeqIO
RGD = []
for record in SeqIO.parse("input.fasta", "fasta"):
rgd_count = record.seq.count('RGD')
if rgd_count >= 1:
RGD.append(record)
SeqIO.write(RGD, "RGD_Proteins.fasta", "fasta")
How can I introduce regex in this such that, RGD(N) is fine except, RGDP ?
Thanks in advance.
AP