Question

Random forest for gene expression

0

Entering edit mode

6.8 years ago

1769mkc ★ 1.2k

Im quoting the lines from this book chapter

"Schematic illustration for gene expression of microarray data. Figure modified from [47].
From the computational perspective, the microarray data is described as an N × M matrix. Each
row describes a sample and each column represents a gene except the last column which means the
class label of each sample. gi, j
is a numeric value representing the gene expression level of gene j
in the i-th sample. ci
is the class label of the i-th sample "

It says the last column is the label what exactly is the label I understand samples in rows and genes in columns , and the last column is the class label what kind of label is it?

The chapter

RNA-Seq R • 4.0k views

ADD COMMENT • link updated 6.8 years ago by noorpratap.singh ▴ 330 • written 6.8 years ago by 1769mkc ★ 1.2k

score 3 · Accepted Answer · 2018-02-12

3

Entering edit mode

6.8 years ago

noorpratap.singh ▴ 330

Whole point of random forest or any other classifier is to be able to predict some class labels(o/and get some useful features associated with the phenotype). So if you want to some sort of classification trying to know what exactly the class a particular sample belongs to for that random forest is used. The last column is asking for labels in order to train the classifier and learn specific gene expression rules associated with a class.

Lets for example, you want to predict whether a sample belongs to normal or tumor. So first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information. Then for the sample so far not used for training, you want to determine which class(normal or tumor) it is and learnt random forest enables you to that.

Hope it helps

ADD COMMENT • link 6.8 years ago by noorpratap.singh ▴ 330

0

Entering edit mode

can you give me an example ? because say for im doing for normal vs disease haematopoiesis so i have expression values for both normal haematopoiesis with their intermediate lineages and matures one as well as disease haematopoiesis expression so how do i achieve this " first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information." , as of now i have only expression values..so how I do i make a classifier first ,I hope you can get my question..

ADD REPLY • link 6.8 years ago by 1769mkc ★ 1.2k

1

Entering edit mode

I get their questions so columns are genes(lets say total N) rows are samples(lets say p) Matrix p*N such that p[i][j] denotes the expression of sample i for gene j. Now add another column such for every normal sample you add the label N and for every tumor sample T, this additional column denotes the class of sample. Then put this matrix in the classifier for training. However to see whether your classifier works you could perform a n-fold cross validation.

Below link might help. Machine Learning For Cancer Classification - Part 2 - Building A Random Forest Classifier

ADD REPLY • link 6.8 years ago by noorpratap.singh ▴ 330

0

Entering edit mode

now i get an idea to start i will do it and get back to you

ADD REPLY • link 6.8 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

so i have to make give a subset of the data to train isn;t it? and how do i choose from my data matrix which will be comprised of both normal and tumour ?

ADD REPLY • link 6.8 years ago by 1769mkc ★ 1.2k

0

Entering edit mode

Well for starters you can do a 10-fold cross validation. There exists functions in any language for doing it.