Dear Maria,
what do you mean by the "biological meaning of the normalized results"?? I believe that your question is a bit irrelevant to the true question that you would like to infer after normalization. In other words, there is no rationale in categorize your probesets in any groups except the ones that they are characterized based on their phenotype info: that is tumor and normal samples . Moreover, what is your specific experimental design ?? I would personally suggest, that you should be searching for probesets where you have "very little" or no evidence on expression across the majority of all samples, and exclude them from further consideration, as these would be "biologically uninformative" for your downstream analysis.
For istance, as you mentioned above you use the hgu133plus2
array. Thus, an excellent package you could use is the PANP R package
(http://www.bioconductor.org/packages/release/bioc/html/panp.html) to create absent/present calls for each one of your probesets (you can check more details about the methodology in the paper and vignette). Then, I would further suggest to use something like the following:
Assuming you have for instance in total 10 samples, 5 cancer and 5 control samples(you could provide more info here or adjust the following code according to your categories):
then first you could type:
present <- rowSums(call.matrix=="P") > 5
where call.matrix
is the output of the PANP function above, which return a matrix of probesets in the rows and the samples in the columns-with values A/P/M according to if a probeset is characterized as absent or present-.
and then:
Eset <- Eset[present,]
where Eset is your normalized expression set, and what you finally keep is probesets which are characterized as "present" in at least one of your conditions(>5).
Hope this will help,
Efstathios