EDIT: question was unclear and I misunderstood the first time around. Here is the second attempt.
So - what you want is the "maximum of medians", across samples. Here's a toy data frame - rows = probesets, s1 - s3 = samples:
df1 <- data.frame(s1 = c(1:4), s2 = c(2:5), s3 = c(5:8), gene = c("g1", "g1", "g2", "g2"))
df1
s1 s2 s3 gene
1 1 2 5 g1
2 2 3 6 g1
3 3 4 7 g2
4 4 5 8 g2
You can use apply() to add a column containing the medians for each row:
df1$med <- apply(df1[, 1:3], 1, median)
df1
s1 s2 s3 gene med
1 1 2 5 g1 2
2 2 3 6 g1 3
3 3 4 7 g2 4
4 4 5 8 g2 5
Use aggregate() to get the "maximum median" per gene:
df2 <- aggregate(med ~ gene, df1, max)
df2
gene med
1 g1 3
2 g2 5
Then you can merge the data frames to get the original rows:
merge(df1, df2)
gene med s1 s2 s3
1 g1 3 2 3 6
2 g2 5 4 5 8
As in my first answer, you'll need an annotation package or file which maps probesets to gene names for your Affy platform.
FIRST ANSWER
I think you are a little confused. You do not want to select "the probeset with median expression for a gene." There is no such thing. For example, the median of the values 1, 2, 4 and 6 = 3, but no one of those values is itself 3.
What you want is the median value of all probesets for a gene. One way to do this is to use the aggregate() function in R. Imagine a data frame, df1, that looks something like this:
s1 s2 s3 gene
p1 V11 V12 V13 g1
p2 V21 V22 V23 g1
p3 V31 V32 V33 g2
p4 V41 V42 V43 g2
Row names (p1, p2...) are probesets. Columns s1 - s3 are samples, containing RMA values. Column gene contains 2 genes (g1, g2), each of which have 2 probesets.
To get the median RMA value per gene in a new data frame:
newdf <- aggregate(. ~ gene, df1, median)
Note that this can be quite slow, even for modest-sized data frames. There is sure to be an equivalent function in one of the Bioconductor packages. You will also need an annotation package or file which maps probesets to gene names for your Affy platform.
Neilfws: I am sorry for not being clear. if the gene is represented by multiple probe sets than I need to select the probe set with the highest median expression across all samples to represent the expression of that gene. I am sure there must be a method/function to do so, its just I am not being able to find it. Thanks. A.K
Aha, well that is something quite different. Will edit answer when I have time.