You wouldn't necessarily have to change your feature selection method (though you may want to depending on the question you're asking). You should change the way your target classes are represented though, if you're using machine learning classifiers.
Continuous variables can be predicted by regression, as you stated. Discrete variables can also be done in this way if you don't care about strict boundaries between classes (e.g. 2.5 is an acceptable answer), but if you're looking for a potentially more accurate output, you could use indicator variables to create a separate classification boundary.
So let's say you had age classes 1, 2, and 3. You would create the target label matrix of dimension M, where M = # of instances you wish to classify:
[ [ 1 0 0 ],
[ 0 1 0 ],
[ 0 0 1 ],
[ 0 1 0 ],
....
[ 1 0 0 ] ]
Each column represents a true/false value for each age category. This way, feature selection won't select features as often that correspond to an output of age class 2.5, but will instead favor outputs that correctly classify to age class 2 or 3.
All of this depends on which feature selection strategy you use though.
Thanks Steven,
What if I don't want to discretize my response variable into groups? In that case it would definitely influence feature selection techniques applicable to this problem, right? I'm currently reading this article by Hira & Gillies (2015), which makes distinctions between FS techniques for classification and regression problems. So my current hypothesis is that indeed depends on whether you have class labels as reponse or a continuous variable like age.
So maybe my question was a bit unclear, but I want to use continuous age. I only want to know if there are FS techniques should be applied in this setting in stead of the class where you have class labels as a response variable.
It sounds to me like you want regression in the end but might be more interested in dimensionality reduction to begin with. Here is a brief synopsis of the three most relevant output measures for classification/regression:
Any number of techniques can do this, and most people traditionally use generalized linear models (GLMs) to do the actual regression part. However, since you're interested in feature selection, you may want to do this in multiple steps:
Dimensionality reduction can again include any number of techniques, (e.g. Principal Components Analysis, Localized Linear Embedding, Laplacian Eigenmaps, etc. -- the list is quite long). Another paper out of my university that might interest you (especially for microarray data) is using Support Vector Machines with the sparse 1-norm to select batches of features, then use those reduced features to do regression: http://www.ncbi.nlm.nih.gov/pubmed/24274115
The most important aspect of what you want to achieve is to most accurately predict age (continuous) from your highly dimensional data, so you want to make sure that the measures you use are the best possible measures. Dimensionality reduction then regression is the way to go for that. As for which methods you should choose, you'll have to base that on testing how good your regression is (some measure of error). If that error falls, then you're doing better; standard ways of doing this with a single data set include cross-fold validation and seeing how your error looks on each validation fold.
Sorry I can't be of more help here; the answer to big data questions is usually that there are many choices that could work equally well. They have to each be tested on the data to determine their value.