Machine Learning on noisy genome data. Scikit-learn python
0
0
Entering edit mode
9.3 years ago

I want to classify data using three dimensions, lets call them: A,B, and C


B and C are almost always positively correlated. B+C and A are usually negatively correlated. However C is usually an "all or none" statistic; we see it sometimes but not always.

With this in mind I chose to classify data using Linear Discriminant Analysis in the scikit-learn python library. http://scikit-learn.org/stable/modules/generated/sklearn.lda.LDA.html

I'm not entirely married to LDA but my PI would like to keep a linear model.

I would like to train the data but apply a weight expressed in this pseudo-code

   lda = LDA.()
   lda.train(trainX,trainY, weights=('None','None',"all_or_none") )
   # "all_or_none" indicates that when C is absent to NOT penalize the prediction

I'm a little naive in machine learning, maybe there's another way to do this in scikit-learn?

Thanks!

scikit-learn python machine learning statistics • 3.3k views
ADD COMMENT

Login before adding your answer.

Traffic: 2542 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6