Hi, I am looking for a scoring method. I have on one side a correlation score (-1,1 range) per gene (>22,000) for different fates (A, B, C) giving me a link between each gene and each fate. On the other side I have the result of several differential expressions (so log2FC values (-3 to +3 range) and their p-value) per gene.
Which statistics/formula could I use to obtain a score per fate for each differential expression result. The goal is to understand if the differential gene expression leads to an increase or a decrease of the fate according to the correlation (integrating both positive and negative impact into one value over all genes).
I was thinking of multiplying logFC x corr for each genes, summing over all genes to get an overall score. But I don't know what would be the validity of such score?
(sum (logFC * corr) )/ n_genes
Any suggestion to a statistical method/ publication applicable in this case? Thanks!
Thank you, I will try lm approach.
One question though: Since each gene has 1 logFC value but multiple corr_score (one for each fate A, B, C), shouldn't I run a lm for each fate
Log2FC ~ corr_score
? this way I would use the slope and p-value as a "fate alteration score" ? (maybe my description of the data was unclear)If you only have one Log2FC per multiple fates, you should start off with separate models per fate. If you find linear regression working well you could build a slightly more complex regression such as
Log2FC ~ FateA + FateB + ...
. If you do format your model like this you would probably want to use lasso linear regression to reduce the less important coefficients closer to 0 to get a better idea of which fates are most correlated with Log2FC.