This may seem like a simple statistical question, but here goes: I have an algorithm which takes cancer patient data (gene expression and survival data) and comes up with a small subgroup of patients that have poorer prognosis (Kaplan-Meier survival p-value around 0.001) based on a signature generated. Currently the method used 250 patients, and between 25 and 50 of them are in the poor-prognosis group to give us this p-value. I am trying to find the smallest number of patients that this method will work on. For instance, if I have 50 patients and 5 are selected for the poor prognosis subgroup, what is the confidence that I have (and how can I compute this confidence) that the method is working (and to what significance)? For instance, are 50 patients enough to have the confidence that there is 1% chance that this arrangement could have happened by chance (p=0.01)? And how would I calculate the number of patients necessary to have confidence levels of 0.1, 0.05, 0.01, etc. I am trying to find the best statistical simulation to achieve this and feel that I am overlooking something simple. Any help would be greatly appreciated. Thanks!