From sequence one,can derive thousands features, but which feature is more predictive.
Basically we can start with a simple PCA to see which feature is more explanatory and then project other to that one. but some times,there is not any structure behind PCA and seems all features have equal contribution.
Also, in classification problem, we can start by simple tree base classification and prune the tree and evaluate our model. but how would you deal when the problem is regression ?! say prediction of solubility of a protein from its sequence.
Is there any baseline or procedure among bioinformaticians ?!
You define your problem more specifically than "the problem is regression" and I might do just that :-)
thanks for the comments but I wish you address "regression" problem more specifically.
Hope the question got more specific
"prediction of solubility of a protein from its sequence", now might be you could edit or add some more comment to your response.