Question

Calculating z-scores and p-values from ratios

0

Entering edit mode

7.8 years ago

mimA ▴ 30

Hello all,

I have a question that I can't seem to figure out. I have protein data for 2 different treatments (3 samples each) and only 1 control sample. I want to run some statistics to find differences between the 2 treatment conditions but using the control because it is important for this experiment. Someone suggested to me to calculate FC for each treatment sample using the one control sample and then converting these ratios to z-scores. I have converted them to z-scores now (so for each protein I have 3 z-scores in treatment 1 and 3 z-scores in treatment 2. Is this way acceptable? Also I'm wondering how to get 1 p-value out of these for each protein?

Thanks a lot!

proteomics statistics p-values z-scores • 8.3k views

ADD COMMENT • link updated 7.8 years ago by Petr Ponomarenko ★ 2.8k • written 7.8 years ago by mimA ▴ 30

score 0 · Answer 1 · 2017-02-07

First, let me assume that you have protein expression data where you have n proteins and 3 samples for each treatment.

The person's suggestion of using FC should be done in this way:

1, calculate the mean expression of each protein, using the 3 samples.

2, divide each protein's mean expression by the control's expression, to obtain fold changes.

3, perform a logarithm (usually base 2) to obtain log fold changes.

4, now, you have a population of log2 fold changes. You can calculate a mean and a standard deviation from the fold changes.

5, Using the mean and stdev of the log2 fold changes, calculate z scores for each protein and then their p value.

I hope this helps,

score 0 · Answer 2 · 2017-02-08

0

Entering edit mode

7.8 years ago

Petr Ponomarenko ★ 2.8k

Hi mimA,

It looks like you are trying to test if there is a difference between two treatments while you have one control and 3 samples for each treatment. Your null hypothesis H0 here should be that there is no difference under different treatments. That way you can find p-value for that null hypothesis to be true under given data. That way your question was asked on other forums already, i.e. http://stats.stackexchange.com/questions/62558/test-difference-between-samples-with-very-small-sample-size http://stats.stackexchange.com/questions/37993/is-there-a-minimum-sample-size-required-for-the-t-test-to-be-valid

In short, everything depends on your assumption on variance distribution for each treatment and if you can assume these variances to be equal between two treatments.

If very little information is known about distributions and sample sizes are small than rank tests like Mann–Whitney–Wilcoxon test is a safer approach. https://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U_test

ADD COMMENT • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

Thanks for your answer Petr. The only thing I'm concerned about is whether its acceptable to use ratios for example like I calculated by dividing with control and use those numbers directly to calculate p-values for example using something like limma?In this way the levels of proteins in each treatment become relative to the control. Do you have any thoughts on that?

ADD REPLY • link 7.8 years ago by mimA ▴ 30

0

Entering edit mode

Thanks for your answer Petr. The only thing I'm concerned about is whether its acceptable to use ratios for example like I calculated by dividing with control and use those numbers directly to calculate p-values for example using something like limma?In this way the levels of proteins in each treatment become relative to the control. Do you have any thoughts on that?

ADD REPLY • link 7.8 years ago by mimA ▴ 30

0

Entering edit mode

The way you normalize the data first depends on your experiment and your assumptions about distributions of observable values. In some situations, your approach can work. Could you please describe experiments in more detail?

ADD REPLY • link 7.8 years ago by Petr Ponomarenko ★ 2.8k

0

Entering edit mode

I have mass spec data which represents proteins in quan values. These values are quite big (in thousands ex. 1000, 1300 2200 and so on). I have been told by our proteomics facility that no further normalisation is required for this data. However, since we wanted expression levels relative to our control we divided treatment quan values by the control quan values thus resulting into ratios for each replicate of treatment 1 as well as treatment 2. To determine differences between the 2 treatments, I was thinking could I calculate now the average of these ratios for treatment1 and average of ratios for treatment2 and simply calculate a fold-change between them and calculate p-values using lets say a t-test

ADD REPLY • link 7.8 years ago by mimA ▴ 30

0

Entering edit mode

Yes mimA, you can. That is the right approach. You can use t-test if you have some good reasoning to assume normal distribution for your samples. Averaging on 3 samples for each treatment is ok for t-test.