Question

Score from correlations per genes in DGE results

1

Entering edit mode

3.7 years ago

Lopiniatre ▴ 10

Hi, I am looking for a scoring method. I have on one side a correlation score (-1,1 range) per gene (>22,000) for different fates (A, B, C) giving me a link between each gene and each fate. On the other side I have the result of several differential expressions (so log2FC values (-3 to +3 range) and their p-value) per gene.

Which statistics/formula could I use to obtain a score per fate for each differential expression result. The goal is to understand if the differential gene expression leads to an increase or a decrease of the fate according to the correlation (integrating both positive and negative impact into one value over all genes).

I was thinking of multiplying logFC x corr for each genes, summing over all genes to get an overall score. But I don't know what would be the validity of such score?

(sum (logFC * corr) )/ n_genes

Any suggestion to a statistical method/ publication applicable in this case? Thanks!

scoring genomics rna enrichment • 1.1k views

ADD COMMENT • link updated 3.7 years ago by rpolicastro 13k • written 3.7 years ago by Lopiniatre ▴ 10

score 3 · Answer 1 · 2021-03-29

3

Entering edit mode

3.7 years ago

rpolicastro 13k

If you are going to use Log2FCs for a downstream purpose like this, I would use the shrunken Log2FCs from DESeq2, so that less confident FC values are shrunken closer to 0.

I think a good place to start would be a simple parsimonious linear regression using the formula Log2FC ~ corr_score + fate. Do the proper linear regression quality control, and if things look good the slopes should give you an idea whether this is some significant relationship between Log2FC and corr_score, and whether there are some differences based on fate.

The direction of the analysis after this will somewhat depend on how the QC and your model coefficients look.

ADD COMMENT • link 3.7 years ago by rpolicastro 13k

0

Entering edit mode

Thank you, I will try lm approach.

One question though: Since each gene has 1 logFC value but multiple corr_score (one for each fate A, B, C), shouldn't I run a lm for each fate Log2FC ~ corr_score ? this way I would use the slope and p-value as a "fate alteration score" ? (maybe my description of the data was unclear)

ADD REPLY • link 3.7 years ago by Lopiniatre ▴ 10

0

Entering edit mode

If you only have one Log2FC per multiple fates, you should start off with separate models per fate. If you find linear regression working well you could build a slightly more complex regression such as Log2FC ~ FateA + FateB + .... If you do format your model like this you would probably want to use lasso linear regression to reduce the less important coefficients closer to 0 to get a better idea of which fates are most correlated with Log2FC.

ADD REPLY • link 3.7 years ago by rpolicastro 13k