I have a small artificial neural network and a dataset with about 10.000 input data for a binary classification problem.
I would like to tune the parameters of my model: number of training epochs, learning rate, batch size, hidden layer nodes, dropout rate etc.
At the moment, I split my dataset into a training (80%), validation (10%) and testing set (10%), and at each epoch, I train the neural network on my training set and evaluate it on the validation set. After a certain number of epochs, I've determined that the loss on the validation set is not decreasing anymore, so I stop training and select the model at the epoch with the lowest loss. Then I use this model to test on a seperate test set, and report that performance.
In this way, I've used a subset of my data (the validation set) to determine one of my model's parameters: the number of traning epochs.
I have two questions:
How would I go about tuning the other hyperparameters for my neural network? I've read somewhere that using the same validation set to tune the other hyperparameters introduces bias because this validation set was already used to determine the model, and therefore it is better to specify a second validation set.
If the performance on the test set varies a lot due to imbalanced data, how could I stabilize those results and again how would I tune the hyperparameters of such a model?
Thank you in advance, and I'm sorry if the question is not clear I hope you take the time to read it
First this is not a bioinformatics question but a machine learning one so there are more suitable places for this kind of questions, e.g. CrossValidated. Second this is not specific to neural networks. The way to tune parameters in a supervised machine learning setting is typically by grid search, i.e. trying out all combinations of parameters (within some predefined ranges of values). However exhaustive grid search if often too computationally expensive and other approaches have been developed. Actually this CrossValidated post (and links in there) answers your question. Class imbalance is also a standard problem in machine learning and there are also plenty of answers elsewhere. Typically one would use up-or down-sampling (possibly with some refinements). In the case of neural networks (and also some other machine learning approaches), one can also introduce class weights in the loss function.