I'd been looking for a way to distinguish cancer cells from other epithelial cells in tumor scRNA-seq data and found inferCNV method. It finds cells with large-scale chromosomal deletions or insertions from scRNA-seq data based on "normal" cell references. I followed their test running on pediatric glioblastoma data and got the same results. However, I didn't understand the final result. The resulting heatmap looks like this.
I can see that the algorithm subclustered some malignant cell groups based on the expression of genes in certain chromosomes. But how do I find cancer cells from this kind of output?
yes, this is what I am looking for. The paper that I am trying to reproduce says "We scored each cell for the extent of CNV signal, defined as the mean of squares of CNV values across the genome. Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5."
They calculated scores a little bit differently than you did. What do you think about it?
Hi @arno.guille
I'm curious about this model. I've got true reference normal cells and cancer cells. How would you set it up? As a classification model, predicting normal/cancer using the CNV score as independent variable? Thanks!
inferCNV isn't meant to "identify malignant cells", it's meant to identify chromosomal copy number changes present in malignant cells. To do this, it requires a subset of cells to be designated as "normal" cells to use as references. The top heatmap in your example corresponds to the normal oligodendrocyte and immune cells in the dataset, which were used as the reference.
The bottom heatmap corresponds to the malignant cells, which clearly have some clonal populations based on shared/distinct copy number changes between the various clusters. Chromosomes with red (high expression) are likely amplified in those malignant cells versus the normal cells, while conversely, those with blue (low expression) are lost. The "normal" cells don't have these big swings because they're normal diploid cells with minimal structural changes, in comparison to tumor cells, which frequently have DNA screwed up to such a degree structurally that it's frankly impressive they still manage to proliferate at all.
You should look at their extensive wiki, in which they explain how to interpret the figure (Interpreting the figure) and which file contains what.
For example:
HMM_CNV_predictions.*.pred_cnv_regions.dat contains the regions containing the CNVs (red and blue patterns)
HMM_CNV_predictions.*.pred_cnv_genes.dat contains the list of genes (in the regions above) likely to be affected
infercnv.observations_dendrogram.txt contaisn the newick formatted dendrogram (e.g. clustering of the cells in the figure)
yes, this is what I am looking for. The paper that I am trying to reproduce says "We scored each cell for the extent of CNV signal, defined as the mean of squares of CNV values across the genome. Putative malignant cells were then defined as those with CNV signal above 0.05 and CNV correlation above 0.5." They calculated scores a little bit differently than you did. What do you think about it?
Hi @arno.guille I'm curious about this model. I've got true reference normal cells and cancer cells. How would you set it up? As a classification model, predicting normal/cancer using the CNV score as independent variable? Thanks!
Hi arno.guille, I used your method and want to cite the paper, could you please provide me some published papers using your method? Thanks!