Im quoting the lines from this book chapter
"Schematic illustration for gene expression of microarray data. Figure modified from [47].
From the computational perspective, the microarray data is described as an N × M matrix. Each
row describes a sample and each column represents a gene except the last column which means the
class label of each sample. gi, j
is a numeric value representing the gene expression level of gene j
in the i-th sample. ci
is the class label of the i-th sample "
It says the last column is the label what exactly is the label I understand samples in rows and genes in columns , and the last column is the class label what kind of label is it?
The chapter
can you give me an example ? because say for im doing for normal vs disease haematopoiesis so i have expression values for both normal haematopoiesis with their intermediate lineages and matures one as well as disease haematopoiesis expression so how do i achieve this " first you train your classifier giving the gene expression profiles of the samples and their labels so that your model/classifier could learn some useful information." , as of now i have only expression values..so how I do i make a classifier first ,I hope you can get my question..
I get their questions so columns are genes(lets say total N) rows are samples(lets say p) Matrix p*N such that p[i][j] denotes the expression of sample i for gene j. Now add another column such for every normal sample you add the label N and for every tumor sample T, this additional column denotes the class of sample. Then put this matrix in the classifier for training. However to see whether your classifier works you could perform a n-fold cross validation.
Below link might help. Machine Learning For Cancer Classification - Part 2 - Building A Random Forest Classifier
now i get an idea to start i will do it and get back to you
so i have to make give a subset of the data to train isn;t it? and how do i choose from my data matrix which will be comprised of both normal and tumour ?
Well for starters you can do a 10-fold cross validation. There exists functions in any language for doing it.
okay...I will search for it