Machine Learning To Find The More Informative Features To Predict And Outcome
1
1
Entering edit mode
12.3 years ago

Hi,

I have a matrix, with a lot of columns/features, I want to predict one of this column (outcome) using the other ones. i am very new in doing this kind of things, and I don't know if I can find exactly what I want, so sorry for my silly question in that case.

My intention is to find the features that contribute more to predict my variable. All of them are numeric, although I can transform to discrete values.

Any suggestion how to start, or what method to use. I am playing with WEKA, but since it integrates a lot of algorithms I don't know exactly what it means each of the parameters in the results.

Also I played with linear regression, but I don't know how to find the best model (have I to play with different number of combinations of all the features?) and neither the coefficients are a direct value to look at and to assume that this feature contribute more or less.

prediction • 4.0k views
ADD COMMENT
1
Entering edit mode

As an aside, scikit in python has a surprisingly good tutorial for this kind of thing: http://scikit-learn.org/stable/index.html

ADD REPLY
0
Entering edit mode

that's great, i use python normally for my scripts. so it is perfect if I decide the model first!

ADD REPLY
4
Entering edit mode
12.3 years ago
Johan ▴ 890

There are of course several ways to do this, but feature selection might be a good start. (http://en.wikipedia.org/wiki/Feature_selection) The basic idea is to create subsets of the features and create predictive models and then rank the features based on how the model performs.

I have tried WEKA, but it was a long time ago, so I cannot help you there. However I would like to point to another machine learning software RapidMiner. In my opinion it is more intuitive than WEKA, but I have not compared then next to each other, so I cannot say which is better. For RapidMiner, you can find an excellent screen-cast on feature selection here: http://www.youtube.com/watch?v=JlhoTAk1ow8

Overall the Vancouver Data Blog, has a lot of nice entry level machine learning tutorials. And even if most of his examples are from financial data, they can easily be translated to work with biological data.

I hope this helps you along the way.

ADD COMMENT
0
Entering edit mode

thanks for that great help. I will definitely try RapidMiner!

ADD REPLY

Login before adding your answer.

Traffic: 2144 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6