I have recently started working with scRNA-seq data. I am following the tutorials by the creators of Seurat. In the final section titled "Assigning cell type identity to clusters", the authors mention that
Fortunately in the case of this dataset, we can use canonical markers to easily match the unbiased clustering to known cell types:
After this, they define a set of genes and the corresponding cell types and use that to annotate their dataset. I wanted to apply the same concept to a different scRNA-seq based breast cancer dataset. I derived the final list of clusters but got stuck while trying to manually annotate them. Basically, I have downloaded a list of markers (a matrix containing genes and cell types) from the Cell Marker database. I also have an expression matrix with rows as genes and columns as cells. Now how do I manually assign each cell to a particular cell type?
Manual annotation is a difficult task in that it requires you to have a clear idea of what cell types are present in your samples and what marker genes define them. If you're struggling with it, perhaps you could try an automatic annotation instead (eg SingleR).
The thing with SingleR is that I am not getting the cell types I am expecting given my data. For instance, I have breast cancer expression measurements, so I am expecting basal, luminal cell types. But SingleR is skipping those entirely for some reason.
I think you can use an ad hoc dataset to infer cell types in singleR. So in your case, you could use a public scRNA-seq of breast cancer patients in which they identified these subtypes. Or otherwise you could use some marker genes, if these are well described.
There can be several reasons for that:
SingleR will only work well if your reference data set encompasses the cell types that you have in your target data set. You could try to find an annotated single-cell reference data set from the literature that deals with very similar cell types.