Question

How to distinguish between normal and malignant epithelial cells based on CNV?

0

Entering edit mode

14 months ago

fifty_fifty ▴ 70

One of the method that is used to find malignant cells in tumor scRNA-seq data is infercnv. This, this, and this papers used infercnv outputs to find cancer cells among other epithelial cells.

They separate cancer and non-cancer cells based on 2 thresholds: CNV score and CNV correlation. CNV score for each cell is computed as the mean of squares of residual expression across the genes. CNV correlation, however, is computed as a correlation between the CNV profile of each cell and the average CNV profile of all cells from the corresponding tumor, except for those classified by gene expression as non-malignant.

I am not sure how to calculate the latter. What is "CNV profile"? And how to compute "the average CNV profile of all cells from the corresponding tumor, except for those classified by gene expression as non-malignant"?

I was directed to this issue on their github, but no one clarified how to compute CNV correlation there.

scRNA-seq r CNV infercnv CNA • 2.2k views

ADD COMMENT • link updated 14 months ago by LChart 4.6k • written 14 months ago by fifty_fifty ▴ 70

score 1 · Answer 1 · 2023-09-06

1

Entering edit mode

14 months ago

LChart 4.6k

InferCNV gives a CNV score per individual cell.

In all the cells you sequenced, there will be populations of (1) cancer cells; (2) normal cells from the cancer cell-of-origin [in this case, epithelial]; and (3) normal cells that aren't even of the same type as the tumor-originating population (typically immune cells). There may be, in addition, clusters of epithelial cells known to be normal (because, for instance, they cluster with cells from a known non-cancerous tissue).

To compute "CNV correlation", you first consider only cells that are not known to be normal (cancer or potentially cancerous). Then you compute the mean CNV score across all of these cells. This establishes the baseline "cancer" CNV score. Finally, for each cell, you compute the correlation between the cell CNV score and the "cancer" CNV score you just calculated.

ADD COMMENT • link 14 months ago by LChart 4.6k

0

Entering edit mode

thank you. Just to be sure, by "InferCNV gives a CNV score per individual cell." you mean infercnv_obj@expr.data which is cell by gene matrix right? So I'll need to compute a score for each cell as a mean of squares across all genes?

ADD REPLY • link 14 months ago by fifty_fifty ▴ 70

0

Entering edit mode

Sorry - CNV score per gene per cell; so you have a (n_cell, n_gene) matrix of residual scores (or transposed). The "mean CNV score" would be the row means.

ADD REPLY • link 14 months ago by LChart 4.6k

1

Entering edit mode

thank you. This is what I've done:

I rescaled values of infercnv@expr.data between -1 and 1
CNV scores for each cell were calculated as means of squares of each rescaled value across the genes
I took all epithelial (hepatocytes) cells' CNV signals from infercnv@expr.data and rescaled them from -1 to 1
baseline "cancer" CNV score vector was computed as rowMeans of the dataframe from p.3 -> I had an average CNV signal for each gene in potentially "cancer" cells
I calculated the correlation between each cell's CNV signals (each column of the df from p.1) and the vector from p.4
I plotted the cells' CNV score and CNV correlation and colored the cells if they were reference non-malignant cells (blue) and known hepatocytes (red)

While CNV corr makes sense CNV score looks almost the same for all cells. What am I doing wrong?

ADD REPLY • link 14 months ago by fifty_fifty ▴ 70

0

Entering edit mode

So:

(1) inferCNV should provide both an "expected" expression value as well as a "residual" value. It's the residual value that's used to actually "call" CNVs (a run of many genes with a high residual would implicate an amplifcation, and with low residuals would implicate a deletion).

(2) The sum of the residuals across genes within cells would be kind of and estimate of total burden -- but you're better off using the calls themselves for this since there will be lots of noise from the many genes with small residuals.

(3) The cancer CNV vector and the cell correlations should be built from scaled residuals.

(4) I'm assuming you're rescalling to [1,-1] because other publications do so? Otherwise what is the justification for rescaling the residuals?

I would also recommend saving the raw and residualized expression values throughout the various steps of denoising; as you may find that one set of values outperforms others.

ADD REPLY • link 14 months ago by LChart 4.6k