Question

InferCNV results interpretation

0

Entering edit mode

16 months ago

fifty_fifty ▴ 70

I'd been looking for a way to distinguish cancer cells from other epithelial cells in tumor scRNA-seq data and found inferCNV method. It finds cells with large-scale chromosomal deletions or insertions from scRNA-seq data based on "normal" cell references. I followed their test running on pediatric glioblastoma data and got the same results. However, I didn't understand the final result. The resulting heatmap looks like this. enter image description here

I can see that the algorithm subclustered some malignant cell groups based on the expression of genes in certain chromosomes. But how do I find cancer cells from this kind of output?

cnv scrna-seq r • 5.2k views

ADD COMMENT • link updated 6 months ago by fxw193 • 0 • written 16 months ago by fifty_fifty ▴ 70

score 2 · Answer 1 · 2023-08-31

2

Entering edit mode

16 months ago

arno.guille ▴ 420

To identify tumor cells using Infercnv, I computed a score based on the number of genes with copy number alterations (CNA).

for example :

scores=apply(infercnv_obj@expr.data,2,function(x){ sum(x < 0.95 | x > 1.05)/length(x) })

Then, inspect the distribution of this score.

enter image description here

If you observe a clear bimodal distribution, you can easily set a threshold (for example, 0.2) to distinguish between tumor cells and normal cells.

If you have both "true" normal and cancer cells, you can construct a simple model (for example, using logistic regression).

I applied this method to multiple scRNA-seq cancer datasets, and it worked pretty well.

ADD COMMENT • link 16 months ago by arno.guille ▴ 420

0

Entering edit mode

yes, this is what I am looking for. The paper that I am trying to reproduce says "We scored each cell for the extent of CNV signal, defined as the mean of squares of CNV values across the genome. Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5." They calculated scores a little bit differently than you did. What do you think about it?

ADD REPLY • link 16 months ago by fifty_fifty ▴ 70

0

Entering edit mode

Hi @arno.guille I'm curious about this model. I've got true reference normal cells and cancer cells. How would you set it up? As a classification model, predicting normal/cancer using the CNV score as independent variable? Thanks!

ADD REPLY • link 11 months ago by Maria • 0

0

Entering edit mode

Hi arno.guille, I used your method and want to cite the paper, could you please provide me some published papers using your method? Thanks!

ADD REPLY • link 6 months ago by fxw193 • 0

score 1 · Answer 2 · 2023-08-31

inferCNV isn't meant to "identify malignant cells", it's meant to identify chromosomal copy number changes present in malignant cells. To do this, it requires a subset of cells to be designated as "normal" cells to use as references. The top heatmap in your example corresponds to the normal oligodendrocyte and immune cells in the dataset, which were used as the reference.

The bottom heatmap corresponds to the malignant cells, which clearly have some clonal populations based on shared/distinct copy number changes between the various clusters. Chromosomes with red (high expression) are likely amplified in those malignant cells versus the normal cells, while conversely, those with blue (low expression) are lost. The "normal" cells don't have these big swings because they're normal diploid cells with minimal structural changes, in comparison to tumor cells, which frequently have DNA screwed up to such a degree structurally that it's frankly impressive they still manage to proliferate at all.

score 0 · Answer 3 · 2023-08-31

You should look at their extensive wiki, in which they explain how to interpret the figure (Interpreting the figure) and which file contains what. For example:

HMM_CNV_predictions.*.pred_cnv_regions.dat contains the regions containing the CNVs (red and blue patterns)
HMM_CNV_predictions.*.pred_cnv_genes.dat contains the list of genes (in the regions above) likely to be affected
infercnv.observations_dendrogram.txt contaisn the newick formatted dendrogram (e.g. clustering of the cells in the figure)