Question

ANOVA TEST in R

1

Entering edit mode

10.1 years ago

adnanjaved1988 ▴ 80

Hey All I have data frame with 5 Samples A,B,C,D,E.

A is parent (reference)sample and rest of samples are from patients. each row represents a miRNA and value against that row in each column represents Back ground subtraction values of that miRNA in each sample. I want to perform ANOVA test in R. I am bit confused how I should perform either with Parent and one patient sample (A&B) A&C and so on). Secondly most of the ANOVA tests which I saw on google and youtube they have for example one column with data second column with different groups for the value for example

Weight Loss     Diet
1.2              A
22.3             A
5.4              C
33.5             B  etc
                                          A        B         C         D
hsa-miR-199a-3p, hsa-miR-199b-3p         NA 13.13892  5.533703  25.67405
hsa-miR-365a-3p, hsa-miR-365b-3p   15.70536 52.86558 18.467540 223.51424
hsa-miR-3689a-5p, hsa-miR-3689b-5p       NA 21.41597  5.964772        NA
hsa-miR-3689b-3p, hsa-miR-3689c     9.58696 44.56490 10.102051  13.26785
hsa-miR-4520a-5p, hsa-miR-4520b-5p 18.06865 28.06991        NA        NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA 10.77471  8.039662        NA
                                          E      
hsa-miR-199a-3p, hsa-miR-199b-3p         NA
hsa-miR-365a-3p, hsa-miR-365b-3p   31.93503
hsa-miR-3689a-5p, hsa-miR-3689b-5p 24.26073
hsa-miR-3689b-3p, hsa-miR-3689c          NA
hsa-miR-4520a-5p, hsa-miR-4520b-5p       NA
hsa-miR-516b-3p, hsa-miR-516a-3p         NA

How I should do for my data

Thanks in Advance

Best
Adnan

R • 4.0k views

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

Ram · Accepted Answer · 2014-10-31

3

Entering edit mode

10.1 years ago

mikhail.shugay 3.5k

To transform your data into mirna/group/value format use melt from reshape package. After performing ANOVA you can do a post-hoc T-test using pairwise.t.test function, use for example p.adjust.method="holm" for multiple testing correction. The ANOVA will tell you if group/value are dependent, while post-hoc T-test for group A versus others will tell you which groups are different from parent.

ADD COMMENT • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Dear Mikhail thanks for your answer.

I don't have any replicate in my data all samples have same miRNAs but the value of expression changed from sample to sample.so If I will make groups then I have 2019 miRNAs in each sample for A group I will have 2019 miRNAs and for B same and so on??

ADD REPLY • link 10.1 years ago by adnanjaved1988 ▴ 80

1

Entering edit mode

Yep that would be

let7b A 27
...
let7b B 15
...
let7b C 10

and so on

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by mikhail.shugay 3.5k

0

Entering edit mode

Dear Mikhail after melting my data frame thats how my data look like.

> head(m)
                                             miRNAs Group    value
1                  hsa-miR-199a-3p, hsa-miR-199b-3p     A       NA
2                  hsa-miR-365a-3p, hsa-miR-365b-3p     A 15.70536
3 hsa-miR-3689a-5p, hsa-miR-3689b-5p, hsa-miR-3689e     A       NA
4                   hsa-miR-3689b-3p, hsa-miR-3689c     A  9.58696
5                hsa-miR-4520a-5p, hsa-miR-4520b-5p     A 18.06865
6                  hsa-miR-516b-3p, hsa-miR-516a-3p     A       NA

and then what I did I compared groups with values by doing

ANOVA1<-aov(m$value~m$Group)

First Question: Do I need to compare values with miRNAs?? after ANOVA test I performed TukeyHSD and results are below.

> TukeyHSD(ANOVA1)
  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = m$value ~ m$Group)

$`m$Group`
          diff        lwr       upr     p adj
B-A   73.87304  -88.20262 235.94869 0.7256734
C-A  -25.55832 -196.36413 145.24749 0.9941714
D-A  203.80312   20.26110 387.34514 0.0207431
E-A   41.04993 -159.09661 241.19648 0.9807637
C-B  -99.43136 -258.28853  59.42581 0.4290920
D-B  129.93008  -42.54789 302.40805 0.2398572
E-B  -32.82310 -222.87472 157.22851 0.9899165
D-C  229.36144   48.65517 410.06771 0.0048776
E-C   66.60826 -130.94103 264.15755 0.8892989
E-D -162.75319 -371.41264  45.90627 0.2081150

If I will interpret these results it shows that we have strong presumption against NULL Hypothesis for the groups D-A and D-C. So we can reject our NULL hypothesis and we have ground realities on believing that there is a relationship between these two groups??

Best
Adnan

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by adnanjaved1988 ▴ 80

1

Entering edit mode

Hello!

Everything seems correct. The anova will show that expression values of your miRNAs of interest indeed significantly vary by patient group in case you get P<0.05. If that is true, you can interpret the results as that you've found a trend between miRNA expression and patient group and post-hoc analysis with multiple testing correction has shown that groups D and A and D and C are significantly different in miRNA expression.

Comparing values with miRNAs is quite odd in your case, basically this will tell if some miRNAs have typically high expression in patients, while some have typically low. Expression of individual miRNAs varies a lot, so I would expect that you're almost certain will see some statistical significance here.

PS I've just realized that a set of 2019 miRNAs could be the whole human mirnome, not just a specific set of miRNAs of interest selected based on prior biological knowledge. Why haven't you used more conventional methods like cluster analysis in your case?

Basically if I'm correct you've found that miRNAs as whole are up/down-regulated, which is weird (unless you're studying some miRNA transcription machinery). You should rather find differentially expressed microRNAs using a package like DESeq, which would also provide means for post-hoc analysis.

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by mikhail.shugay 3.5k

1

Entering edit mode

Ok now I've confused everyone and myself a little :)

First, group A (parent) should be removed from analysis, as it is inappropriate to use same data for normalization/clustering and follow-up ANOVA http://stats.stackexchange.com/questions/116294/appropriateness-of-anova-after-k-means-cluster-analysis

Second, DESeq is for read counts and the OP is talking about microarray data. Use Limma instead

ADD REPLY • link updated 2.8 years ago by Ram 44k • written 10.1 years ago by mikhail.shugay 3.5k