Hello,
Let's imagine that we have the following 5 mutants sequence of Protein kinace C:
S1 KVLGKGSFGKVMLADDKGTEELYA 24 S2 MVLFKGSFGKVMLGDRKGTEELYA 24 S3 MVLGKGSFGKVMLADRKG-EELYA 23 S4 MVLGKGSAGKVMLADRKGTEFLYA 24 S5 MVLGKGS-GKVMLFDRKGTEELYA 23 .. ** *** ***** * ** * *** ..
And after Kyte & Doolittle computings for hydrophobicity per amino acid, the following table was obtained:
AA S1 S2 S3 S4 S5
1 0.633 1.633 1.278 1.278 0.922
2 0.278 1.278 0.922 0.922 0.567
3 0.578 1.578 1.222 1.222 0.867
4 0.578 1.578 1.222 1.111 1.222
5 0.111 1.111 0.756 0.644 0.756
6 0.111 0.467 0.111 0 0.111
7 0.111 0.467 0.111 0 0.111
8 -0.1 0.256 -0.1 -0.211 NA
9 0.367 0.367 0.367 0.256 0.367
10 1 0.756 1 0.889 1.111
11 0.656 0.411 0.656 0.544 0.767
12 0.356 0 0.244 0.133 0.356
13 -0.389 -0.744 -0.5 -0.5 -0.389
14 -0.389 -0.744 -0.5 -0.5 -0.389
15 -0.033 -0.389 -0.144 -0.144 -0.033
16 -0.889 -1.244 -1 -1 -0.889
17 -1.489 -1.844 -1.6 -0.9 -1.489
18 -1.489 -1.844 -1.6 -0.9 -1.489
19 -1.833 -1.944 NA -1.244 -1.944
20 -1.244 -1.356 -1.356 -0.656 -1.356
21 -0.356 -0.356 -0.356 0.344 -0.356
22 -0.356 -0.356 -0.356 0.344 -0.356
23 0.189 0.189 0.189 0.889 0.189
24 0.689 0.689 0.689 1.389 0.689
There is also an a priori knowledge regarding mutant binding to a trial molecule, and regarding mutant function:
Binding Functional
S1 1 1
S2 1 1
S3 2 0
S4 2 0
S5 0 1
What statistical tests would you recommend to:
1.See if the difference between means for each mutant protein hydrofobicity is statistically significant?
mu1 = mu2 = mu3 = ... = mu n
2.The same as above but comparing each pair individually?
S1 S2 S3 S4 ... Sn
S2 p - - - -
S3 p p - - -
... . . . - -
Sn p p p p -
3.See if the the amino acid property can somewhere be related to binding and functional properties?
I think that the one-way anova won't do much good because we can see them as paired samples, paired by aminoacid.
Do you think that repeated mesures anova here can be used here?I was thinking on the pairwise.t.test for paired samples in R, with Bonferroni as Method for adjusting p values.
What do you think of this method?
NOTE: This is fabricated data.
I have read the http://biostar.stackexchange.com/questions/4208/statistical-analysis-of-protein-sequence-properties post, and i reckon that there are a few similarities in both problems, but even so the objective are quite different.
Thanks in advance.
Could you please be a little bit more specific? Which means? Besides that many amino acid properties are not independent, specially on a residue basis. Can you state your test question?
YEs,
I think so. But i think you missunderstood the property, it is just one but an observation per amio-acid/protein.
H: Are the means of hidrophobicity different between proteins.
(Probably an one-way anova)
H: Which proteins have means of hidrophobicity different from each other. (I'm thinking about the post-hoc tests here)
H: How the means are correlated with binding.
(this probably goes for a classification problem, or a simple correlation test)
H: How the means are correlated with function.
(idem)
YEs, I think so. But i think you missunderstood property, i meant just one observation (hidrophobicity) per amio-acid/protein. The hipotesis would be: H: Are the means of hidrophobicity different between proteins. (Probably an one-way anova) H: Which proteins have means of hidrophobicity different from each other. (I'm thinking about the post-hoc tests here) H: How the means are correlated with binding. (this probably goes for a classification problem, or a simple correlation test) H: How the means are correlated with function. (idem)
I've forgot to ask: which scale you are using?
As I promised. A paired t-test with Bonferroni's correction is very similar to ANOVA. On either case, the most precise way to interpret your case is "same subject, different treatments". The tests I suggest assumed independent samples. As I said before, on an amino acid basis, hydrophobicity isn't a independent measure. Normally, methods to estimate it use a window of size 3-5 aa. Yet, you can use ANOVA for testing all pairs.
You are right about the need to use methods that use a window of size 3-5 aa, i don't think that for this particular analysis (hidrophobicity), but let's imagine that we are studying another property that depends on surrounding aa, what methods do you think that could be appropriate for this?
Hydrophobicity depends on the neighbors. Check the scales at ExPASy/ProtScale. If you really really want to perform a powerful analysis to cross analyze sequence-function relationships (aa properties included) you must check Raganathan Lab (http://www.hhmi.swmed.edu/Labs/rr/). He developed the most powerful methods to date. Quite laborious, but worth a try!!!
Thanks for the hint. I will tell you about the results.