Random forest for gene expression
1
0
Entering edit mode
6.8 years ago
1769mkc ★ 1.2k

Im quoting the lines from this book chapter

"Schematic illustration for gene expression of microarray data. Figure modified from [47].
From the computational perspective, the microarray data is described as an N × M matrix. Each
row describes a sample and each column represents a gene except the last column which means the
class label of each sample. gi, j
is a numeric value representing the gene expression level of gene j
in the i-th sample. ci
is the class label of the i-th sample "

It says the last column is the label what exactly is the label I understand samples in rows and genes in columns , and the last column is the class label what kind of label is it?

The chapter

RNA-Seq R • 4.0k views
ADD COMMENT
3
Entering edit mode
6.8 years ago

Whole point of random forest or any other classifier is to be able to predict some class labels(o/and get some useful features associated with the phenotype). So if you want to some sort of classification trying to know what exactly the class a particular sample belongs to for that random forest is used. The last column is asking for labels in order to train the classifier and learn specific gene expression rules associated with a class.

Lets for example, you want to predict whether a sample belongs to normal or tumor. So first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information. Then for the sample so far not used for training, you want to determine which class(normal or tumor) it is and learnt random forest enables you to that.

Hope it helps

ADD COMMENT
0
Entering edit mode

can you give me an example ? because say for im doing for normal vs disease haematopoiesis so i have expression values for both normal haematopoiesis with their intermediate lineages and matures one as well as disease haematopoiesis expression so how do i achieve this " first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information." , as of now i have only expression values..so how I do i make a classifier first ,I hope you can get my question..

ADD REPLY
1
Entering edit mode

I get their questions so columns are genes(lets say total N) rows are samples(lets say p) Matrix p*N such that p[i][j] denotes the expression of sample i for gene j. Now add another column such for every normal sample you add the label N and for every tumor sample T, this additional column denotes the class of sample. Then put this matrix in the classifier for training. However to see whether your classifier works you could perform a n-fold cross validation.

Below link might help. Machine Learning For Cancer Classification - Part 2 - Building A Random Forest Classifier

ADD REPLY
0
Entering edit mode

now i get an idea to start i will do it and get back to you

ADD REPLY
0
Entering edit mode

so i have to make give a subset of the data to train isn;t it? and how do i choose from my data matrix which will be comprised of both normal and tumour ?

ADD REPLY
0
Entering edit mode

Well for starters you can do a 10-fold cross validation. There exists functions in any language for doing it.

ADD REPLY
0
Entering edit mode

okay...I will search for it

ADD REPLY

Login before adding your answer.

Traffic: 1550 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6