I downloaded proteomic data from CPTAC, but it is already in log2FC form, [while the spectral count lacks the sample mapping]. Now, I want to determine which genes are statistically significant, what test can be applied?
As, logically the ratio of protein expression between Tumor and normal should be 0 [log2(Tum/Norm = 1) = 0], can I use one sample t test (with mu = 0) or Wilcoxon Signed Rank test (gene expression is not gaussian)? or any other method?
For instance, I created a random distribution following min/max of gene expression and then compare the actual gene expression against this random distribution? Is it logical?
The p-values of 1sample T.test, 1 sample Wilcoxon or Wilcoxon with random distribution are in close range (0.89, 0.90, and 0.202)
Any hint in this regard is highly appreciated. ,
Thank you for the detailed response. I will try again to find how the data have been processed, but so far no luck.
For normality assumption in T-test, expression of few genes follow normal distribution while others do not normal distribution when tested by "Shapiro-Wilk normality test".
" you could look at the distribution of mean LFCs across the whole dataset and check that it is symmetrical and centered around 0." The mean of the mean-values is -0.0005 and mean of median-values is -0.008. I can assume that it is close to 0, but it is not normally distributed rather negatively skewed (shapiro test <<<0.01).
I think I should follow one-sample Wilcoxon signed rank test.
I like i.sudbery's response. With your current data, I would definitely 100% use the Wilcox. You have more than enough samples and you wouldn't be violating any distributional assumptions. You'll get plenty of proteins that are statistically significant and are unlikely to be false positives. Good sensitivity + few false discoveries = meaningful biological results :)