Hi, I'm new with R and I have a dataset with more than 1000 proteins. Every row contains data for the single protein. In the columns I have 3 values for the control, 3 values for Treatment 1, 3 values for Treatment 2 and 3 values for Treatment 3. I need to perform ANOVA and TukeyHSD between the 4 treatment groups for every protein. I think I should use for, but I can't select row by row the elements that I need every time, considering that aov functions for both anova and Tukey requires values belonging to a single group inserted in the same column. Then, I also need to get a table with p-value of anova and Tukey too Can anyone explain me in a simple way how to do all of that?
Thanks.
You should look into the limma R/BioConductor package for analysing this data. limma leverages information from the whole dataset to provide more accurate estimates at a per-feature (gene or protein) level. If you search for "limma proteomics" you will find some literature and plenty of code to help you get started, e.g.:
Detecting significant changes in protein abundance
R guide: Analysis of Cardiovascular Proteomics Data
Thanks for your reply. I have read what you sent me, but still I'm not able to solve my problem, because of my scarce ability in programming. If possible, could you let me see the necessary code? My header is:
Protein CTR CTR CTR T1 T1 T1 T2 T2 T2 T3 T3 T3
Every row corresponds to one protein and the column contain protein abundance values. Thank you.
In general I am reluctant to provide code to blindly follow without that you could write it yourself. There are many pitfalls during an analysis so I recommend to either learn to properly "code" yourself, which here means some confidence with R basics, or to consult with someone locally. In general for limma you need a numeric matrix where columns are samples and rows are observations (genes, proteins, whatever you measure). You seem to have that, and would need to put
Protein
as rownames usingrownames()
. Then, given the data are normalized (cannot tell you how to because I do not know your data) you can start with the default limma pipelines, solmFit
and all downstream steps as described in its vignette. Please read up on the manual on how to make a design etc.Thank you.