Entering edit mode
2.6 years ago
Pradeep
•
0
Dear all, I have a fasta file having multiple protein sequences. I want to sort those protein sequence based on amount of desired amino acids. How can I do that?
For example: I want the following sequences in descending order with D and G amino acid content:
>FastaA
ASDFGHILMNV
>FastaB
SKSYGLKQAPPDTITLIAAKSNS
>FastaC
FQRRYVVWILAVSRHIVFLEN
>FastaD
LAPKDYKLELDDGSDVMK
Output file:
>FastaD
LAPKDYKLELDDGSDVMK
>FastaB
SKSYGLKQAPPDTITLIAAKSNS
>FastaA
ASDFGHILMNV
>FastaC
FQRRYVVWILAVSRHIVFLEN
What exactly do you mean by "amount of desired amino acids"? In any case, if you can code in python, you can easily sort strings (sequences) by a custom function using the
sort(key=your_function)
syntax.Assuming that sequences are in single line
If there is a tie, longer sequence gets printed first