I have a collection of peptide sequences that were pulled down by polyclonal antibodies generated against a (longish) specific antigen. They have been hand-curated to segregate the sequences into expected positives and expected negatives. As I have found to be typical for this type of experiment, it looks like different antibodies within the polyclonals are recognizing different parts of the antigen.
We've defined about 20 submotifs from the antigen that will cover most of the expected positives with low/no occurence in the expected negative set, but there is significant overlap among them. My goal is to cover the largest number of expected positives with the fewest motifs. Literature and Google searches on protein motif/antigen/computation pull up tons of papers, but they are mostly on motif discovery. My question is whether this is a known class of problem, and, if so, what is it called?
Thank you @qdjm! This is what I was looking for, and would never have found otherwise. I'm still not sure of the name of the problem ("subset selection" maybe), but I've been able to pull out similar literature, and that's what I'm after.