Hi,
I am analyzing my scRNA-seq data derived from cancer patient samples. I am planning to follow paper to distinguish normal and cancer cells by its CNV changes (https://www.sciencedirect.com/science/article/pii/S0092867419306877).
Below is the part of words in the supplementary data related to their method:
“We then scored each cell for two CNA-based measures. ‘‘CAN signal’’ reflects the overall extent of CNAs, defined as the mean of the squares of CNA values across the genome.”
My question is what exactly the “mean of the squares of CNA values” is? I defined it as SD of all CAN values across the genome in a cell. Is it correct?
Thanks
Below is the whole words of the methods they used:
CNAs were estimated by sorting the analyzed genes by their chromosomal location and applying a moving average to the relative expression values, with a sliding window of 100 genes within each chromosome, as we have previously described. Cells classified to each of the non-malignant cell types were used to define a baseline of normal karyotype, such that their average CNA value was subtracted from all cells. We then scored each cell for two CNA-based measures. ‘‘CAN signal’’ reflects the overall extent of CNAs, defined as the mean of the squares of CNA values across the genome. ‘‘CNA correlation’’ refers to the correlation between the CNA profile of each cell and the average CNA profile of all cells from the corresponding tumor, except for those classified by gene expression as non-malignant. Cells were then classified as malignant by CNA analysis if they had CNA signal above 0.02 and CNA correlation above 0.4.
It looks like a variance, not a standard deviation. I would say yes - you do mean((CNA - 2)**2) for all autosomes, so it is average squared deviation from baseline copy-number.
You can also try one of the existing methods for CNV analysis: Detecting copy number alterations based on RNA-seq data