Question

Web- based bioinformatics tool/s for identification of cysteine-rich proteins

0

Entering edit mode

8 months ago

lukhanyomakhabane ▴ 30

I would like to know bioinformatic/s tool that can be suitable for the identification of cysteine-rich proteins from set of oomycete effector proteins I have. I want to identify cysteine-rich proteins from this set of proteins and group them according to the content of cysteine residues. Your help would be greatly appreciated!

cysteine proteins • 620 views

ADD COMMENT • link updated 11 weeks ago by Mensur Dlakic ★ 29k • written 8 months ago by lukhanyomakhabane ▴ 30

score 1 · Answer 1 · 2025-01-21

1

Entering edit mode

11 weeks ago

colindaven 7.4k

Why not just get the protein sequences and write or ask an LLM to provide an awk or python script to count the number of cysteines in the sequence.

You could optionally normalize the number of Cs by the protein sequence length to get %Cs.

ADD COMMENT • link 11 weeks ago by colindaven 7.4k

score 0 · Answer 2 · 2025-01-21

There is a subset of BioPython called SeqIO with many examples of how to read and manipulate biological sequences. You would simply need to add a cysteine counter. Here is a basic script:

import sys
from Bio import SeqIO

FastaFile = open(sys.argv[1], 'r')

for rec in SeqIO.parse(FastaFile, 'fasta'):
    name = rec.id
    seq = str(rec.seq)
    cys = seq.lower().count('c')
    print(name, len(seq), ('%.4f' % float(cys/len(seq))))

FastaFile.close()

This will print each sequence name, total length and a fraction of cysteines. It is easy to customize it so only sequences with higher fraction of cysteines are listed.