Entering edit mode
5.8 years ago
kamel
▴
70
Hello Biostar
I have a multifasta file that contains protein sequences (of different sizes), I want to extract only those sequences that contain at least 5% of cysteines (C).
Can you give me a little script so that I can extract these sequences
Thank you in advance
See this updated post:
https://web.archive.org/web/20071027112709/http://python.genedrift.org/2007/10/10/alternative-methods-to-split-a-fasta-file/
Let's use Python:
make your sequences looks like 2 strings each:
This is just an idea.
Sorry, we cannot "give you a little script". Please tell us what you've tried so far and where you're facing difficulties, and we can help you with specifics.
I can quantify the number of C (cystein) by grep but in each sequence (ID). I have several protein sequences in a single multifasta file, I am looking for an idea to quantify the number of C in each sequence at a time, for extract the sequences which contains 5% cysteine.