I have performed differential expression analysis using the Limma package on one microarray experiment. Next, I plan to perform feature selection using various methods such as Gini Index, Information Gain, Information Gain Ratio, Rule, Chi Squared Statistic, Tree Importance, Uncertainty, Deviation, Correlation, Relief, SVM, and PCA weighting algorithms. After selecting features, I intend to build models using one of the following algorithms: KNN, Neural Network, SVM, or Random Forest, and evaluate their performance.
My question concerns preparing the input for these ML steps. Is a log2 transformation an essential step in preparing the expression data for machine learning, or can the normalized raw expression data also be input directly into ML methods without log2 transformation? I would greatly appreciate any clarification on this matter .
You're shooting buzz words as in the question before. Please read the underlying literature and follow guided tutorials. There is no general answer to this. log2 is often preferrable to dampen the variance of the data simply due to count magnitude, but what that means in each particular method may depend on how it works.
I would greatly appreciate it if you could refer me to a tutorial on conducting feature selection for microarray data using some (or all) of the mentioned methods (e.g., Gini Index, Information Gain, SVM-based selection, etc.). Thank you! Unfortuately, I wasn't able to find one!