I see... aspetta... if you can get your data like this, then you can do different tests:
data
Day Protein1 Protein2 Protein3 Protein4 Protein5 Protein6
Day1 0.01 0.04 0.05 0.06 0.34 0.04
Day2 0.05 0.45 0.06 0.04 0.03 0.04
Day3 0.56 0.07 0.34 0.56 0.05 0.04
melt the data
require(reshape2)
data.melt <- melt(data)
colnames(data.melt) <- c("Day", "Protein", "Expression")
data.melt$Day <- factor(data.melt$Day, levels = c("Day1","Day2","Day3"))
stat test
Then, perform a non-parametric Kruskal-Wallis and Dunn's Test to compare the time-points. This test determines if there exists a global difference across the time-points.
require(FSA)
dunnTest(Expression ~ Day, data = data.melt, method = 'bh')
Dunn (1964) Kruskal-Wallis multiple comparison
p-values adjusted with the Benjamini-Hochberg method.
Comparison Z P.unadj P.adj
Day1 - Day2 -0.1371082 0.89094528 0.8909453
Day1 - Day3 -1.6727199 0.09438245 0.2831473
Day2 - Day3 -1.5356117 0.12463364 0.1869505
plot data per protein
require(ggplot2)
ggplot(data.melt, aes(x=Day, y=Expression)) +
stat_summary(geom = "bar", fill="forestgreen", fun.y = mean, position = "dodge") +
facet_wrap(~ Protein, ncol = 3) +
stat_summary(geom = "errorbar", fun.data = mean_se, position = "dodge") +
xlab("") +
ylab("emPAI") +
scale_y_continuous(breaks = round(seq(0, max(data.melt$Expression), by = 0.1), 1)) +
#geom_text(size=4, y=-20) +
#Set the size of the plotting window
theme_bw(base_size=24) +
#Modify various aspects of the plot text and legend
theme(
legend.position='none',
legend.background=element_rect(),
plot.title=element_text(angle=0, size=18, face="bold", vjust=1),
axis.text.x=element_text(angle=90, size=18, face="bold", hjust=1.10),
axis.text.y=element_text(angle=0, size=18, face="bold", vjust=0.5),
axis.title=element_text(size=18, face="bold"),
#Legend
legend.key=element_blank(), #removes the border
legend.key.size=unit(1, "cm"), #Sets overall area/size of the legend
legend.text=element_text(size=18), #Text size
title=element_text(size=18),
#facet wrap
strip.text.x = element_text(size=18),
strip.text.y = element_text(size=18)) +
#Change the size of the icons/symbols in the legend
guides(colour=guide_legend(override.aes=list(size=2.5)))
calculate ratios across time-points
tdata <- t(data[,2:ncol(data)])
tdata
Day1 Day2 Day3
Protein1 0.01 0.05 0.56
Protein2 0.04 0.45 0.07
Protein3 0.05 0.06 0.34
Protein4 0.06 0.04 0.56
Protein5 0.34 0.03 0.05
Protein6 0.04 0.04 0.04
ratios <- t(apply(tdata, 1, function(x) c(
x[1] / x[2],
x[2] / x[3],
x[1] / x[3])))
colnames(ratios) <- c("Day1vs2","Day2vs3","Day1vs3")
ratios
Day1vs2 Day2vs3 Day1vs3
Protein1 0.20000000 0.08928571 0.01785714
Protein2 0.08888889 6.42857143 0.57142857
Protein3 0.83333333 0.17647059 0.14705882
Protein4 1.50000000 0.07142857 0.10714286
Protein5 11.33333333 0.60000000 6.80000000
Protein6 1.00000000 1.00000000 1.00000000
----------------------------------------
It would be difficult to get a single p-value for each protein because there are only 3 values. Ratios make more sense, in this case.
Kevin
Hey Mozart, still playing good music with that violin?
What is the source of your mass spec data, do you know?
Hey Kevin, glad to see you there...yeah I am keep doing amazing stuff with this piece of wood...!!!
so I am not entirely sure I got it right but probably you mean some extra information about the machine and like...so:
Mass spectrometer is a Thermo Velos Pro
Sample time run was 1hr
reaction volume: 10µl (desalted ziptip) buffer used: 5% acetonitrile; 0.1% formic acid
does it help??
thanks!
Okay, thank you! Is it not proteomics data that you have? Can you paste a few rows of the data? You could also plot as histogram of the data to see how it looks. I do not know which fields / columns contain the expression values.
thanks to you, Kevin!
so what we got is:
Family (149) Member (1) Database (SwissProt) Accession (KDM4A_MOUSE) Score (18) Mass (3933362) Num.of matches (534) Num.of significant matches (1) Num of sequences (544) Num of significant sequences (1) emPAI (100) Description (Lysine-specific demethylase 4A OS=Mus musculus GN=Kdm4a PE=2 SV=6)
hope it helps...I have no idea of the information included in the Description section, I am afraid. Neither the guy here, who provides the data, seems to be very convincing to me.
Thanks. It is making more sense, now. The emPAI is a relative measure of protein expression. Please see here: Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein.
I presume that you can use the emPAI values for analysis. Usually, it would be, for example, a Mann-Whitney test (non-parametric). Can you plot the distribution of the emPAI values?
thanks a lot for your (very much appreciated) help, Kevin. You are the best...
hope that make any sense
Is there only one column of values for emPAI? - you mentioned 2 tissue at different time-points (?)
Sorry, I didn't get it!
I actually have just one tissue at different time points at the moment...but, I mean, I just want to know the time effects in these two systems. so I guess it is the same identical thing.
so time 0 1 day 2 days, respectively
I posted an answer. Forse aiutará! / Maybe it will help!
I don't really know how big was your help...as usual you solved my problem in a matter of secs; you are incredible Kevin!!
Thanks a lot, I guess that's all I need..
I will dedicate one of my composition to you! ahahah
Thanks again.
Anything for the magnificent Mozart!