Well, let's assume this requirement makes sense (I wouldn't question it without knowing the background of what you are trying to do, while I would imho try retain the maximum information possible)
And I don't think your question was so totally unclear, btw. First, let me paraphrase your question, in case I got you wrong:
You have gene expression measures under certain experimental variables in an n x m matrix, where n rows correspond to genes and m columns correspond to variables/samples. Now you wish to reduce the number of columns into a single representative measurement or generally a matrix with m' < m. This is clearly a dimension reduction problem. There are very many approaches for this:
Simple approach: replace the measurements for each gene by a single point estimate e.g. mean, median
or even chose a single representative variable.
Better: apply Principle component analysis to identify the direction that explains most variance in the data and project all data on the first (or first few) principal component(S).
There are many more advanced methods, but I would start with the simples first and see how far I get.
Edit: It is all implemented in R as most basic functions (try the following):
For the simple functions get help with:
?mean
?median
?rowMeans # for easy application to a matrix of measurements
For PCA use either:
?princomp # uses eigen value decomposition
?prcomp # uses singular value decomposition, more accurate
To get out a matrix of projected values (repl. USArrests with your data):
prcomp(USArrests, scale = TRUE)$x # choose the PC column that suits you best
princomp(USArrests, scale=T)$scores # same as above
Make sure to also use and understand the biplot
and screeplot
functions on your PCA data.
All depends a bit on the way your data is formatted, so if you need more advise, post a specific question which includes your data, too.
You first need to define what you mean by "merging" the labels into one.
As i mention before, e.g the heat shock stress condition has a different label for 10, 15, 20 minutes etc. how may I sort of combine all those labels into only general 1 heat shock label..so that i only have to analyze the general label such as heat shock, DTT, Menadione and so on...