I have a list of compounds, and we would like to know which proteins they are able to interact with. We pretend to use docking. So one brute force approach would be to download the whole PDB protein data set and perform docking calculations against all of them. This would be very time consuming. I wonder what could be a better approach. I wonder whether proteins in PDB database are grouped in functional clusters (i.e. kinases) or structural clusters, so I can take one representative protein of each cluster and the whole protein set gets considerably reduced to let's say 1000 proteins.
Which approach would you use in this case?