I would like to know bioinformatic/s tool that can be suitable for the identification of cysteine-rich proteins from set of oomycete effector proteins I have. I want to identify cysteine-rich proteins from this set of proteins and group them according to the content of cysteine residues. Your help would be greatly appreciated!
There is a subset of BioPython called SeqIO with many examples of how to read and manipulate biological sequences. You would simply need to add a cysteine counter. Here is a basic script:
import sys
from Bio import SeqIO
FastaFile = open(sys.argv[1], 'r')for rec in SeqIO.parse(FastaFile, 'fasta'):
name = rec.id
seq= str(rec.seq)
cys = seq.lower().count('c')
print(name, len(seq), ('%.4f' % float(cys/len(seq))))
FastaFile.close()
This will print each sequence name, total length and a fraction of cysteines. It is easy to customize it so only sequences with higher fraction of cysteines are listed.