Hi all,
I'm trying to assess the gene expression correlation between a specific gene and all other genes, thanks to a public scRNA seq dataset of human melanoma biopsies.(https://singlecell.broadinstitute.org/single_cell/study/SCP109/melanoma-immunotherapy-resistance#study-summary)
For this I perform a spearman correlation test on the tpm values, after filtering the samples to keep only the cells where my gene is expressed. However I'm only getting low correlation (all under 0.25), and I'm wondering why.
According to you, is low correlations necessarily expected ? (due to the low sensitivity of scRNAseq ? )
Would anyone know highly correlated housekeeping genes that I could use as positive controls ?
Thank you for any input
I think, it is hard to determine what is biologically expected without too much speculation. Depends on the gene and the However, I think there is one big problem here, where you are shooting yourself in the foot: "after filtering the samples to keep only the cells where my gene is expressed".
Let's assume you don't have a dose-dependent regulation, but a binary one. Then you have remove all the samples, that would allow you determine that case. I think you should never remove samples like that, instead, you could remove lowly expressed genes or genes with overall low variation (e.g. rowsum < #cols and row sd or mad < threshold). You might also try to determine the correlation distribution of all genes in your data set (after filtering) to determine which amount of correlation is expected.
In network analysis, one also often uses Bi-weight mid correlation (bicor package in R).
Many thanks for the advice. If I want to assess the correlation between two genes without filtering for gene expression I will get many cells with no detected expression for the two genes. Which test would be appropriate to evaluate the expression correlation, while handling for those "missing values" ?