I am interested in testing the relationship between two genes, assuming I have the activities of the two genes of interest (or proxy by transcriptomics, though I understood it is imperfect). More specifically, I have around 50 data points (randomly sampled, which can be deemed as sampling at different time points) for both genes.
I believe it is relatively easy to assess whether these two genes are linearly correlated, in which, I am going to use the Pearson correlation. Then, I can assess by looking at the R2 or the p-value of the correlation coefficient between these two genes. Please correct me if I am wrong here.
I am also thinking about using linear regression for this purpose. However, I am not sure what metric/ output I should look for. Is it just the p-value of the specific coefficient? (Does it matter how I put the genes as dependent and independent variables?)
In addition to linear relationships, there are many other possible relationships in biology, such as does-response/ switch and biphasic. I wonder if there are ways to test these out. If the linear relationship can be tested with linear regression, I wonder if I can apply the same logic and use drc
and nls
for testing purposes. Would be great if someone could give me some hints on what metric should I look for.
I do prefer the proposed can be generalized to assess pairwise relationships on thousands of genes. I know a way to test these mathematically, is by constructing ODE, which can be fitted by deSolve
. However, this seems impossible to have a model involving many genes. I am not sure if there is a way to systemically characterize many genes in parallel (either treating each pair that is independent or dependent on other pairs).
Thanks in advance!
Why not just use the spearman correlation?
Good suggestion. I believe Spearman is good for linear, does-response-like, and Michaelis-Menten-like. However, it might perform poorly when dealing with the case of bi-phasic, that is the gene will go up-regulated before trending down again.
I find these traditional correlation tests questionable for single-cell data where you have many data points so even poor correlations due to outliers easily can be highly significant. A completely different approach with similar goal could be to use something like NMF and then see which genes end up in the same dimensions. Depends of course on your question and what you want to answer.
I agree with not using correlation on single-cell data. I am not using single-cell data (if I need to use them in the future, I will do pseudo-bulk). Meanwhile, would you mind suggesting any key paper for NMF? I have not heard that before. Thanks!
Just google NMF for single-cell data. It's around for some time but not as widely adopted as PCA or other dimensionality reduction techniques.