Finding The Mean Of Values In A Single Column
2
0
Entering edit mode
11.0 years ago
robjohn7000 ▴ 110

I have a data frame (process.yield):

process    Yield
35        0.38
37        0.29
89        0.75
90        0.82

I want R to calculate the mean of values in column 2 ("Yield"). This seems trivial, but somehow after applying functions like apply, aggregate, mean, by, I have not been able to get the right results. I'm guessing there is a problem with my data frame.

Example: Aggregate function:

process.yield.mean <- aggregate(process.yield, by=list(process.yield$Yield), FUN=mean)

Error from aggregate function:

1: In mean.default(X[[1L]], ...) :
 argument is not numeric or logical: returning NA 
  etc etc

Can anyone help please?

r • 41k views
ADD COMMENT
0
Entering edit mode

The error message is telling you that the thing you are passing to mean() isn't a numeric or a logical vector. Use class() to work out what your column is (most likely a character vector?) and convert it. If you are reading this data in from .csv you may find some of the entries in that column are not, in fact, numbers?

ADD REPLY
2
Entering edit mode
11.0 years ago

How about just mean(process.yield$Yield)? That would seem rather simpler.

ADD COMMENT
0
Entering edit mode

mean(process.yield$Yield) gave this error: [1] NA Warning message: In mean.default(process.yield$Yield) : argument is not numeric or logical: returning NA

ADD REPLY
1
Entering edit mode

What about mean(as.numeric(process.yield$Yield))?

ADD REPLY
1
Entering edit mode

It's actually probably a factor, in which as.numeric(levels(x))[x] is the way to go. As per ?factor. In any case, the correct diagnosis for the problem is in the error message...

ADD REPLY
0
Entering edit mode

That was exceptional David. as.numeric(levels(x))[x] did it. Why do you think "factor" was introduced, since all I did was to to use cbind() to combine the 2 columns in the data frame.

ADD REPLY
1
Entering edit mode

It's a bit complex, but cbind() and rbind() return matrices, which can only contain one data-type and will convert numerics to characters if there are some in the things that are being bound. as.data.frame() converts character vectors to factors by default. If you have mixed types t's usually best to use data.frame(x=my_numeric, y = my char, z=my_factors)

ADD REPLY
0
Entering edit mode

Thanks again David.

ADD REPLY
0
Entering edit mode

Then as David W. suggested above, those are probably characters, not numbers. Try converting with as.numeric().

ADD REPLY
1
Entering edit mode
11.0 years ago
always_learning ★ 1.1k

summary (process.yield) command will also give mean.

ADD COMMENT
0
Entering edit mode

Thanks all for all your comments. mean() and summary() should have worked, but so far this has not happened, and I'm suspecting the way I put together the data frame in the first place. Process.Yield frame was obtained by combining Process and Yield columns using cbind(). mean() worked fine with Yield column before being combined with Process column.

ADD REPLY

Login before adding your answer.

Traffic: 1582 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6