How can I split my dataset into train and test data sets by deciding certain data should be in the training set and the rest in testing data? I do not want phyton to select randomly, asking about the user to decide it. is it possible in phyton?
I have a small dataset (20 datapoints grouped into two (10 data points in class-1, 10 data points in class-2). I have 30 features of them. I have the second dataset which is even smaller (10 datapoints again grouped into two classes). I want to generate my model by using the first dataset and then use the second (small dataset) to validate my model externally. the aim is seeing how accurate the model for new datasets that is why I don't want to mix the datasets.
thanks in advance.
I don't think you should manually decide which data points go into the test vs. training set, doesn't that defeat the point of training an algorithm?
But ok. What do you want to separate on?
Hi NRC,
thank you for the reply. maybe I couldn't tell my problem clearly as I am new to this area. I have a small dataset (20 datapoints grouped into two (10 data points in class-1, 10 data points in class-2). I have 30 features of them. I have the second dataset which is even smaller (10 datapoints again grouped into two classes). I want to generate my model by using the first dataset and then use the second (small dataset) to validate my model externally. the aim is seeing how accurate the model for new datasets that is why I don't want to mix the datasets. I hope it is clear now. please let me know if it is not.