Hi,
We have proteomics results from ~35 mammals. The samples were taken by surgery of the same organ in all individuals.
Since we are working with individual mammals (as opposed to gnetically identical tissue cultures, or plants) there is high variability between samples. Therefore we cannot use for example FDR to filter TTEST results that identify differentially translated proteins between treated and non-treated individuals. IF we do, there are hardly any significant DE proteins. As a results we use only p<0.05 from TTEST to identify DE proteins (no FDR), and number of DE proteins is affected mainly by fold change cutoffs.
My questions are: 1. What is the minimal recommended cutoff for fold change for such variable samples? 2. How to choose lower cutoff that still makes sense? And is still acceptable for publication in a Fine journal.
I have seen for example that Yuan et al., (2016, in: Journal of Proteomics, https://doi.org/10.1016/j.jprot.2020.103683) have used cutoff of 1.3 fold change (and pvalue<0.05) for samples from human (i.e., they have worked with mammal samples like us).
3. Does anyone know on other works with such fold cutoff or lower that were published in reasonable Journals?
Thank you, Arik
Thank you very much,
However, we should consider that we are using samples which are genetically different and therefore we get high varibility and it is hard to get good FDR:
Considering our list of DE genes seen make biological sense, it will be a shame to discard it because there is no good FDR. To the best of my understandaing the work I have cited above, also faces the same issue (using samples from different humans individuals), and did not use FDR (published in Journal of Proteomics).
Well, i'd start at the position that TTEST is the wrong statistical test for proteomics data.
You can have a good enrichment for biologically relevant genes and still most of the genes be wrong. Take the example above. If you have 500 DE genes out of 8,000, you will have an FDR of 80%. That means 80% of your hits will be false positives. But if 20% are true, thats more than enough to show an enrichment if those 20% are relevant.
If the top 3% of proteins have an FDR of 46%, then there are two possible explainations: either 46% of them are false positives, OR the FDR is incorrectly calculated, because that is what FDR is - the fraction of your results that are expected to be wrong.
This problem is faced all the time in high-through put biology: take GWAS, we can show that with a thershold of 10^-8 we find only a tiny fraction of the relevant genes, and that at 0.05 we find most or all of the relevant genes, but we also find thousands of genes that are wrong.
I wouldn't put too much store by what is published. Just because something is published, doesn't mean it is right.
So if you can't find a better way to calculate FDR, I'd argue that it is better to analyse your results through a framework other than DE, rather than do DE badly.