Hi everyone,
I’ve done differential expression analysis on microarray data using limma. After extracting the expression matrix with exprs(), I applied quantile normalization and log2 transformation.
Now I plan to perform feature selection (e.g., Gini Index, Information Gain, Information Gain Ratio, Rule, Chi Squared Statistic, Tree Importance, Uncertainty, Deviation, Correlation, Relief, SVM, and PCA weights) and build ML models like KNN, SVM, Neural Network, and Random Forest.
One ML expert suggested that if the data is from a single experiment and already quantile-normalized, no further centering or scaling is needed.
However, since ML algorithms like KNN, SVM, and NN are sensitive to feature scale and for PCA-based feature extraction, standardization is also essential due to PCA’s sensitivity to feature variance, shouldn’t I still apply z-score standardization (zero mean, unit variance) before feature selection and building model?
Is the expert’s advice incorrect for ML workflows, even if valid for DE analysis?
Thanks for your insights!
Prior thread for reference : Is a log2 transformation an essential step in preparing expression data for machine learning?
That question was about the need or lack of need to perform log transformation. This question is about the need or lack of need to standardization