Question

Quantitative Proteomics Statistics

1

Entering edit mode

7.4 years ago

martingarridorc ▴ 20

Hi!

I have received a dataset from an old proteomic experiment which contain the SILAC ratio for 3834 proteins for a determined condition. I'm used to use statistical metrics as the p.value associated to a T-test to establish a cut-off for the consequent over-representation analysis, however, this ratio is not accompanied by any type of statistic. I got 3 technihcal replicants.

I'm wondering if exists a method to establish a statistic to this collection of ratios and the librarie/package/software to make it. References would be appreciated

Thank you!

SILAC Proteomics Statistics • 3.1k views

ADD COMMENT • link 7.4 years ago by martingarridorc ▴ 20

score 2 · Answer 1 · 2017-07-13

If the ratios are not already expressed as logs, then log-transform them then check how the values are distributed. If they are roughly Gaussian-shaped, you could use a t-test. If you can't assume normality then you can do a permutation test. In any case, don't forget to correct for multiple testing. You may also want to have a look at the RforProteomics package.

score 0 · Answer 2 · 2017-07-14

0

Entering edit mode

7.4 years ago

martingarridorc ▴ 20

Thank you for your answer Jean-Karim!

I have been taking a look at R for Proteomics Vignette, however i have a problem. This package is designed for the analysis and identification of raw pep files emerging of the analysis. I have a final list of proteins with their ratios. I have transformed them to the log2 scale and they are roughly Gaussian-shaped.

How can i perform the t-test analysis on the data? I'm really lost at this point. I have used the r-base function (t.test) to compare means in other datasets where i have a measure per sample, but never made it with a single list of ratios.

ADD COMMENT • link 7.4 years ago by martingarridorc ▴ 20

0

Entering edit mode

Please use the "add comment" button when replyin to an answer, this keeps the discussion organized.

I assume that the ratio is between two conditions, something like treatment over control. In this case, what you want to test is whether there is a difference in expression between the two conditions which translates into the ratio being significantly different from 1. In the log-transformed space, you then test the null hypothesis that the value is 0. If the mean of your log-transformed data is not 0, you would need to center the data before doing the test.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Yes, that's exactly what they are, ratios between two conditions. If the data were succesfully centered at 0 i would get the following output from r-base t.test function, no?:

t.test(data$mean) One Sample t-test

data: data$mean

t = 0.25743, df = 3833, p-value = 0.7969

alternative hypothesis: true mean is not equal to 0

95 percent confidence interval:

-0.01591990 0.02073247

sample estimates:

mean of x

0.002406286

So once i have checked that they are succesfully centered at 0 and following a Gaussian distribution, how should i continue to obtain a statistic to get those values that represent a significant change in this distribution? Simply by getting the extreme 2,5% of ratio values?

ADD REPLY • link 7.4 years ago by martingarridorc ▴ 20

0

Entering edit mode

If you want to formally test if your data is normally distributed, do a Shapiro-Wilks normality test (shapiro.test()), don't do a t-test. To select proteins with significant change, you test each protein using the replicates (i.e. test if the mean of the replicates is equal to 0). However, with only three replicates and correction for multiple testing, this approach may not have enough power. However, I believe statistics are not the answer to your problem here. I would select proteins whose median over replicates is above a given threshold. Using the median enforces reproducibility, i.e. at least half the replicates will be above threshold. Use prior knowledge to find a biologically-relevant threshold. For example, if key players in the process you're interested in are known to change, you could use this to select the threshold. Or if key players are known but not their change, you could rank the proteins based on fold change and look at how many of these known players you recover at different thresholds.

ADD REPLY • link 7.4 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Okey, i will use a shapiro test!

However, with only three replicates and correction for multiple testing, this approach may not have enough power.

Effectively, after using the t.test as you specified, the number of proteins with a p-val lower than 0.05 is 187, transforming into 0 when applying the FDR correction.... The prior-knowledge approximation sounds so good for this situation, because i have previously experimental evidence of proteins that change under the condition studied. I will try it.

Thank you again, your wisdom is appreciated!

ADD REPLY • link 7.4 years ago by martingarridorc ▴ 20