Question

Anova for proteomics

0

Entering edit mode

3.3 years ago

FF • 0

Hi, I'm new with R and I have a dataset with more than 1000 proteins. Every row contains data for the single protein. In the columns I have 3 values for the control, 3 values for Treatment 1, 3 values for Treatment 2 and 3 values for Treatment 3. I need to perform ANOVA and TukeyHSD between the 4 treatment groups for every protein. I think I should use for, but I can't select row by row the elements that I need every time, considering that aov functions for both anova and Tukey requires values belonging to a single group inserted in the same column. Then, I also need to get a table with p-value of anova and Tukey too Can anyone explain me in a simple way how to do all of that?

Thanks.

proteomics anova • 3.1k views

ADD COMMENT • link 3.3 years ago by FF • 0

0

Entering edit mode

You should look into the limma R/BioConductor package for analysing this data. limma leverages information from the whole dataset to provide more accurate estimates at a per-feature (gene or protein) level. If you search for "limma proteomics" you will find some literature and plenty of code to help you get started, e.g.:

Detecting significant changes in protein abundance

R guide: Analysis of Cardiovascular Proteomics Data

ADD REPLY • link 3.3 years ago by h.mon 35k

0

Entering edit mode

Thanks for your reply. I have read what you sent me, but still I'm not able to solve my problem, because of my scarce ability in programming. If possible, could you let me see the necessary code? My header is:

Protein CTR CTR CTR T1 T1 T1 T2 T2 T2 T3 T3 T3

Every row corresponds to one protein and the column contain protein abundance values. Thank you.

ADD REPLY • link 3.3 years ago by FF • 0

1

Entering edit mode

In general I am reluctant to provide code to blindly follow without that you could write it yourself. There are many pitfalls during an analysis so I recommend to either learn to properly "code" yourself, which here means some confidence with R basics, or to consult with someone locally. In general for limma you need a numeric matrix where columns are samples and rows are observations (genes, proteins, whatever you measure). You seem to have that, and would need to put Protein as rownames using rownames(). Then, given the data are normalized (cannot tell you how to because I do not know your data) you can start with the default limma pipelines, so lmFit and all downstream steps as described in its vignette. Please read up on the manual on how to make a design etc.