I would like to use regular expressions to identify a motif in an amino acid sequence. Part of the the motif is described as '2 or more out of XXXX are D or E'. I wonder if there is a way to specify this part directly with regular expressions instead of writing out all the alternatives or using a more iterative approach.
I'm actually using this in the find box of my editor (sublime text) as it accepts regex (not sure what extensions/definitions it goes to). Otherwise a perl version of regex is where I would implement this.
Thanks!
edit: changed title slightly
edit: changed question to include or more.
What makes you think a regular expression captures such a soft rule? There's not much regular about it. Regex are for phone numbers and email addresses. This could be solved quickly with a sweep procedure looking at all 4-mers along the sequence.
I agree, this problem (N out of M == X) can't be solved with a regular expression unless you use the regex that enumerates all possible cases: eg: (2+ out of 4 == A)