Question

What is the statistics test on proportion data

0

Entering edit mode

6.1 years ago

star ▴ 350

Hi,

I would like to perform a statistical test to see whether there is any significant differences between proportions in three different groups (G1, G2, G3) among different runs. The ID refers to different subjects. The sum of G1, G2 and G3 is always 1 as they are proportions.

I’m interested in comparing different samples regarding the proportions in different groups (comparing the rows).

{r} data <- data.frame(ID=rep(paste0("ID", 1:3), 3), runs = rep(c("run1","run2","run3"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15, 0.35, 0.43, 0.3, 0.2, 0.33))

I really appreciate if you can help me to find a proper statistical test.

R lme glmer statistics • 2.2k views

ADD COMMENT • link 6.1 years ago by star ▴ 350

0

Entering edit mode

What did you measure?

ADD REPLY • link 6.1 years ago by ATpoint 88k

0

Entering edit mode

It's not my own data. I only know it's a kind of classification measurement that shows the proportion of cells that clustered together in each sample.

ADD REPLY • link 6.1 years ago by star ▴ 350

0

Entering edit mode

deleting a post after getting a satisfactory answer is grounds for suspension!

ADD REPLY • link 6.1 years ago by Istvan Albert 102k

0

Entering edit mode

The reason for deleting was an incorrect question! I was not interested in the differences between groups (G1, G2 and G3). I've deleted it to post the correct question!

ADD REPLY • link 6.1 years ago by star ▴ 350

2

Entering edit mode

ok but look someone took the effort to answer your question. You should thank them and leave it be. It is still the correct answer to the question and we need to honor the effort that goes into answering questions.

ADD REPLY • link 6.1 years ago by Istvan Albert 102k

score 2 · Answer 1 · 2019-05-13

Because you have animal groups, tissues and multiple experiments, I would recommend modelling this using a linear model and treating the animals and the tissue as a fixed effect. There is some background you'll have to pick up, but here's a stub to get you started/thinking about analyzing this:

library(reshape2);

## Create dataframe                                                                                                                                                                                                                            
df <- data.frame(ID=rep(paste0("ID", 1:3), 3), tissue = rep(c("liver","brain","heart"), 3), G1=c(0.58, 0.43, 0.43, 0.55, 0.45, 0.33, 0.55, 0.45, 0.43) , G2=c(0.22, 0.33, 0.35, 0.3, 0.2, 0.24, 0.15, 0.35, 0.24) , G3=c(0.2, 0.24, 0.22, 0.15\
, 0.35, 0.43, 0.3, 0.2, 0.33))

# Turn into a molten dataframe                                                                                                                                                                                                                 
df.molten = melt(df)

## Model data set                                                                                                                                                                                                                              
model.lm = as.formula("value ~ variable + ID + tissue")
df.lm = lm(data = df.molten, model.lm)

## Explore results                                                                                                                                                                                                                             
summary(df.lm)

# Move G3 to the front of factor values to change treatment group.                                                                                                                                                                              
df.molten$variable = factor(df.molten$variable, c("G3", setdiff(as.character(df.molten$variable), "G3")))
df.lm = lm(data = df.molten, model.lm)
summary(df.lm)

And the results for these summary(df.lm):

Call:
 lm(formula = model.lm, data = df.molten)

 Residuals:
      Min       1Q   Median       3Q      Max
 -0.13667 -0.04667 -0.02444  0.07333  0.16111

 Coefficients: (2 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)
 (Intercept)  4.667e-01  3.609e-02  12.932 9.33e-12 ***
 variableG2  -2.022e-01  3.953e-02  -5.115 3.99e-05 ***
 variableG3  -1.978e-01  3.953e-02  -5.003 5.23e-05 ***
 IDID2        3.036e-18  3.953e-02   0.000        1
 IDID3       -3.925e-17  3.953e-02   0.000        1
 tissueheart         NA         NA      NA       NA
 tissueliver         NA         NA      NA       NA
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.08386 on 22 degrees of freedom
 Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
 F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573

and

 Call:
 lm(formula = model.lm, data = df.molten)

 Residuals:
      Min       1Q   Median       3Q      Max
 -0.13667 -0.04667 -0.02444  0.07333  0.16111

 Coefficients: (2 not defined because of singularities)
               Estimate Std. Error t value Pr(>|t|)
 (Intercept)  2.689e-01  3.609e-02   7.451 1.88e-07 ***
 variableG1   1.978e-01  3.953e-02   5.003 5.23e-05 ***
 variableG2  -4.444e-03  3.953e-02  -0.112    0.912
 IDID2       -4.626e-17  3.953e-02   0.000    1.000
 IDID3       -2.453e-17  3.953e-02   0.000    1.000
 tissueheart         NA         NA      NA       NA
 tissueliver         NA         NA      NA       NA
 ---
 Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

 Residual standard error: 0.08386 on 22 degrees of freedom
 Multiple R-squared:  0.6081,    Adjusted R-squared:  0.5369
 F-statistic: 8.535 on 4 and 22 DF,  p-value: 0.0002573

In the first comparison, you compare G1 to G2 and G3. The p-value for the coefficient variableG2, variableG3 seem to indicate that the difference between G1 and G2 or G1 and G3 is significant.

In the second comparison, you switch G3 for your treatment group. The coefficient variableG1 is equivalent to the previous comparison's variableG3, so it makes sense that you get the same p-value. However you see here that G3 v.s. G2 is not significantly different (p=0.912).

Try to understand what's going on (starting with the basic of linear regression if you're not familiar with the method) and try to understand the outputs of the model before you use this in any serious analysis.