Entering edit mode
4.4 years ago
david.f.stein
▴
10
I am building a logistic regression classifier using scikit-learn. I have some continuous data with missing values that I would like to impute. I am curious if it is considered better practice to impute before or after normalization. I have tried both and have not noticed a difference in my models performance. However, a colleague suggested that they thought imputation should be performed first, and I can understand their intuition. Does anyone have any insight on this matter? Any literature concerning this would also be appreciated.
Thanks!