Question

Anova P-Values

2

Entering edit mode

11.7 years ago

Diana ▴ 930

Hi everyone,

I'm trying to do an ANOVA on normalized microarray data. My input file looks like this:

Probe_ID    control.CEL    sample1.CEL    sample2.CEL    sample3.CEL
AB112960    2.273088453    2.475903455    2.094961932    2.331040798
AF004856    2.466043833    2.809889315    2.257952048    2.492190838

I have replicates for each (control, sample1, sample2, sample3). I have run ANOVA in r using this script:

aovdata <- read.csv("microarray.csv", header = TRUE)
aovdata1 = aovdata[,2:13]
row.names(aovdata1)<-aovdata$Probe_ID
tissue<-gl(4,3,labels=c("control","aoi","ap","pp"))
aof <- function(x) 
  {m<-data.frame(tissue, x); 
  anova(aov(x ~ tissue))}
anovaresults <- apply(aovdata1, 1,aof)
str(anovaresults)

After performing ANOVA, I'm having trouble extracting the Probe Ids along with their p-value. How can I just extract the probe Ids and the corresponding p-value from ANOVA test? When I view summary(anovaresults), it gives me some different kind of information than str(anovaresults) ?

Thanks a lot!!!

r • 7.8k views

ADD COMMENT • link updated 11.7 years ago by zx8754 12k • written 11.7 years ago by Diana ▴ 930

0

Entering edit mode

Is there a reason you're not using limma? That would be the more standard way to analyze microarray data and results in more convenient data.frames. Aside from that, anova returns an object, which is being coerced into a vector by as.vector. The apply command then returns a matrix of those vectors. You might just run the anova on a single probe and compare those results before and after sending them through as.vector(). That should give you an idea of how everything is formatted. Personally, I would store the anova results in something and then return some sort of meaningfully formatted output from it. Then, you would have better control over things.

ADD REPLY • link 11.7 years ago by Devon Ryan 105k

0

Entering edit mode

limma is definitively done for that. It also adds some very useful functionalities like FDR adjustment which might be relevant to use as you run multiple tests. http://www.statsci.org/smyth/pubs/limma-biocbook-reprint.pdf

ADD REPLY • link 11.7 years ago by Manu Prestat 4.1k

score 2 · Answer 1 · 2013-04-02

2

Entering edit mode

11.7 years ago

zx8754 12k

If you are after Pvalues only then try something like this:

aof <- function(x) {
m<-data.frame(tissue, x); 
anova(aov(x ~ tissue)))[,"Pr(>F)"]
}

http://stackoverflow.com/questions/3366506/extract-p-value-from-aov

ADD COMMENT • link 11.7 years ago by zx8754 12k

0

Entering edit mode

Thanks a lot!!! but I'm getting this error: Error in summary(anova(aov(x ~ tissue)))[[1]][["Pr(>F)"]] : subscript out of bounds

ADD REPLY • link 11.7 years ago by Diana ▴ 930

0

Entering edit mode

I updated the answer, can you try again?

ADD REPLY • link 11.7 years ago by zx8754 12k

0

Entering edit mode

Thanks. i think there is some problem with the code I have written because the function you've given me works but the output is like this: 0.184097265 0.908233888 0.359941167 0.975710791 0.774837269 NA NA NA NA NA with numbers in 1 row and NAs in 2nd row and its not loading completely. I have 38535 genes and 12 columns only. I don't understand why its doing that. Its not giving me the output as gene IDs in 1st column and p-values in 2nd column

ADD REPLY • link 11.7 years ago by Diana ▴ 930

1

Entering edit mode

The NAs are from the "Residuals" row (try running the anova on a single row and you'll see this). Just use [1, "Pr(>F)"] instead, keeping in mind that this could fail if you start making more complicated designs. The function, as written, will never give you gene names as output, since the "anova" function doesn't care about the row names and, thus, won't return them. Perhaps returning 'c(rownames(x), anova(lm(x~tissue))[1,"Pr(>F)"])' would work, though I can't say for sure.