My lab does a lot of single cell RNAseq of samples that include tumor cells. I have a pipeline in place that reduces dimension, clusters cells, and automatically assigns cell type labels to those clusters (mostly using Seurat
and SingleR
). However, I do not currently have a way to differentiate tumor cells from normal cells, which would allow me to perform other downstream analyses such as copy number variation using inferCNV
. Are there any tools in R/Python/Bash etc. that would allow me to differentiate between those two cell types, or is doing so manually using biomarkers the best option? I'd like for the process to be as objective / reproducible as possible.
It will depend on your tumor type. Some are easier to differentiate than others.
You don't necessarily need to identify tumor cells to call CNVs. In fact, you could call CNVs to identify tumor cells.
I think I'll be giving
CONICSmat
a go as a method of identifying CNVs / tumor cells. Thanks for the input.If you are interested, there are a few other alternatives discussed here: Detecting copy number alterations based on RNA-seq data
That thread of yours is where I found
CONICSmat
in the first place :) With regards to using CNVs to identify tumor cells, are there any best-practices documents / guides floating around? I come from a pure stats background so I'm less familiar with some of the (to me) more complicated biology concepts.Usually, only tumor cells should have copy number abnormalities. In panel B below (from Patel et al), you can see the topmost cluster has a flat copy number profile and contains the normal cells.
This is generally true, but does depend somewhat on the tumor type. Certain leukemias have "progenitor" or "poised" populations that may still harbor significant genetic variation despite not being truly malignant. This is where your biological expertise is going to have to come into play.
Thank you both. I had been using the Patel paper as a reference but it looks like I'm going to have to do a much deeper dive research-wise before I start analyzing anything. I don't want to be lacking in domain knowledge.
It really helps if you know what to look for. If you have any clinical karyotype data, it can make your life a lot easier. scRNA CNV calling is coarse - you aren't going to pick up many focal changes (< 1MB). If you have a clinical collaborator that provided you the samples, bug them to give you any information they might have available. If your cancer of interest has very recurrent copy number alterations, that can also help, but there are always variations. Speaking from experience, prior information makes the process much, much easier.
I'll see what I can do, but I believe at the moment we only have scRNA data, maybe some paired bulk RNAseq data. With those resources, do you think trying to estimate CNVs is worth the time or would it be too noisy?
Oh, it can totally be valuable. I'm just not sure it's the best tool to differentiate malignant and normal cells, but again, that's highly cancer-type dependent.
It's also not terribly difficult to do, so I'd say the upside is strong - just trying to make sure you're aware of some of the caveats.
The data we're analyzing is from PDAC, so I'll be doing some pancreas-specific research. Are there any other extant computational methods you'd recommend for differentiating between malignant and normal?
I'm not familiar with that cancer type, so I'm afraid you're on your own there. The suggestions in my answer might be helpful, but I don't know enough about the data/cancer to say which is your best bet.
Just an update for future readers: I've had decent success replicating CNV analyses with
CONICSmat
on publicly available PDAC scRNA-seq data. Obviously processing, filtering, normalization, etc. methods are going to differ between labs but I've been able to see the strongest amplifications and deletions fairly clearly after my analysis.