Machine Learning on noisy genome data. Scikit-learn python

0

Entering edit mode

10.0 years ago

QVINTVS_FABIVS_MAXIMVS ★ 2.6k

I want to classify data using three dimensions, lets call them: A,B, and C

B and C are almost always positively correlated. B+C and A are usually negatively correlated. However C is usually an "all or none" statistic; we see it sometimes but not always.

With this in mind I chose to classify data using Linear Discriminant Analysis in the scikit-learn python library. http://scikit-learn.org/stable/modules/generated/sklearn.lda.LDA.html

I'm not entirely married to LDA but my PI would like to keep a linear model.

I would like to train the data but apply a weight expressed in this pseudo-code

   lda = LDA.()
   lda.train(trainX,trainY, weights=('None','None',"all_or_none") )
   # "all_or_none" indicates that when C is absent to NOT penalize the prediction

I'm a little naive in machine learning, maybe there's another way to do this in scikit-learn?

Thanks!

scikit-learn python machine learning statistics • 3.4k views

ADD COMMENT • link updated 8.3 years ago by Biostar 20 • written 10.0 years ago by QVINTVS_FABIVS_MAXIMVS ★ 2.6k

Login before adding your answer.