Hi all,
I have done an analysis using MATS to find the potential skipped exons. I have about 350 significant skipped exons.
Now I want to calculate a value to know how many skipped exons would I expect by chance? (assuming that there are 10,000 protein-coding genes in mice) - how should I roughly estimate this?
Thank you in advance.
Thank you Istvan for your answer. I was calculating this "expected by chance" value in order to see if the number of the significantly spliced event (skipped exon) reported by rMATS is higher than the expected value or less.
I calculated the expected value in this way "10,000 expressed genes x 10 exons per gene x 0.05 FDR per exon = 5000." And this number is much less than the reported events (356 events with FDR < 0.05) by rMATS.
What are your thoughts on this? Thank you.
The false discovery rate is computed based on a distribution of p-values and describes the rate at which your selection (and the selection only up to that point) contains errors. It is not a probability of something happening over the entire dataset.
When you have 356 events at an FDR of 0.05 it means that you may expect that 365 * 0.05 = 18 of those events to be false positives. It does not mean that the event occurs at a uniform rate of 0.05 in the whole dataset.
Conceptually speaking, when you multiply something by FDR you find a count of expected errors (aka false discoveries) rather than the number of true positives.