Question

Microarray Class Prediction - For Continuous Data?

5

Entering edit mode

12.3 years ago

atheoryofjustice86 ▴ 50

I was wondering if anyone could help point me in the right direction for the following problem (changed slightly to improve comprehensibility).

Let's say that I have a set of 500 microarrays taken from blood samples from 500 different people. Each person is a different age. I want to build a classifier that can predict a person's age based off as few genes as possible. If there were two classes of people ("young" and "old"), I could use a straightforward binary classification algorithm. But I want to predict a person's exact age - so I'm not sure what classification method to use to incorporate what's basically continuous data (500 different ages) rather than just 2 classes. Thanks!

microarray classification prediction • 3.4k views

ADD COMMENT • link updated 10.2 years ago by karlpersius.manlimos • 0 • written 12.3 years ago by atheoryofjustice86 ▴ 50

score 4 · Answer 1 · 2012-08-16

I'm not specifically familiar with microarray data. But what think that what you are looking to create is a regression model. (http://en.wikipedia.org/wiki/Regression_analysis) The most basic example of this being linear regression.

To reduce the number of genes used by the classifier you might want to look into feature selection (http://en.wikipedia.org/wiki/Feature_selection) - this should help you select a subset of genes.

There are a number of programs which implement general machine learning algorithms, WEKA which a alot of people seem to use. And RapidMiner, which I personally prefer. If you want a good starting place for learning rapid miner, this blog and accompanying youtube channel should give you a good start: http://vancouverdata.blogspot.se/

As I said, I have never worked with microarray data, but the methods that I mention should be transferable to any machine learning problem. Hope that this helps. :)

score 4 · Answer 2 · 2012-08-16

Johan is correct; what you have is a regression problem, not a classification problem. First, I suggest you read up on the elements of linear models. Dalgaard's book is very accessible. Then, consider looking at the Lasso, which is a selection method for linear regression (i.e. it attempts to find the smallest set of features which provide a good fit). There is a large literature here, but this is one place to start. Several libraries in R implement variants of the lasso (google "r lasso" or head over to CRAN).

score 0 · Answer 3 · 2014-09-08

0

Entering edit mode

10.2 years ago

karlpersius.manlimos • 0

my study is related to your study, i used Least Angle Regression and Lasso, but im looking for a microarray data for me to use. i need your help, i need a microarray data set. thank you!

ADD COMMENT • link 10.2 years ago by karlpersius.manlimos • 0