The ideal situation would be to obtain the raw data CEL or TXT files (edit: or whatever BEAD arrays use) and then summarise expression over genes using RMA and the median polish.
Given the data that you've got, you can just summarise it yourself as follows:
df <- data.frame(
c("gene1","gene1","gene2","gene2","gene2","gene3"),
c(1,2,3,10,20,30),
c(60,50,40,3,2,1),
c(7,8,9,100,110,120),
c(12,11,10,9,8,7)
)
colnames(df) <- c("gene","sample1","sample2","sample3","sample4")
df
gene sample1 sample2 sample3 sample4
1 gene1 1 60 7 12
2 gene1 2 50 8 11
3 gene2 3 40 9 10
4 gene2 10 3 100 9
5 gene2 20 2 110 8
6 gene3 30 1 120 7
summarise by mean
aggregate(df[,2:ncol(df)], by=df[1], mean)
gene sample1 sample2 sample3 sample4
1 gene1 1.5 55 7.5 11.5
2 gene2 11.0 15 73.0 9.0
3 gene3 30.0 1 120.0 7.0
summarise by median
aggregate(df[,2:ncol(df)], by=df[1], median)
gene sample1 sample2 sample3 sample4
1 gene1 1.5 55 7.5 11.5
2 gene2 10.0 3 100.0 9.0
3 gene3 30.0 1 120.0 7.0
summarise by sum
aggregate(df[,2:ncol(df)], by=df[1], sum)
gene sample1 sample2 sample3 sample4
1 gene1 3 110 15 23
2 gene2 33 45 219 27
3 gene3 30 1 120 7
Kevin
May be you can take average of expression of all the transcript as gene expression.