Entering edit mode
5 weeks ago
zhai
•
0
I was trying to reinforce my manual annotation of scRNA-seq data through reference mapping using the well-annotated dataset and label transfer. There is a lot of atlas for human dataset, but I am working on mouse samples. The only source for mouse reference I know is https://cellxgene.cziscience.com/collections , but I cannot find a satisfied one that could match my own dataset, which is mostly immune cells from autoimmune models. I was wondering if anybody knows there are other good resources for such well-annotated reference atlas?
Thank you very much!! Would you mind give me more hint on how to find the annotation information of the dataset on immgen?
The Immgen site is a bit of a mess. Probably easy to cross-reference the labels directly from GEO for more readable annotations. The dataset we put together was from microarrays - immgen has newer datasets now that could probably be used.
ImmGen is very detailed. They have a particular focus on lymphoid cells and (FACS-)enriched a lot of subsets from (e.g.) T cells in different organs and then did microarray or RNA-seq. I would collect a meaningful subset of the dataset that matches your needs and then use the bulk data to create signatures to score your cells. You can use UCell which needs only signatures, or SingleR which needs markers and the reference data as a count matrix. I prefer SingleR when possible (gives better results) whereas UCell is more generic as it only needs some genes to work with.
Jarad, thank you! You wound to have vast experience. Could you share with your person experience with these options when you have time. I wonder which one to try first when these are options.
Depends entirely on your needs and tissue. The Monaco set in celldex is expansive, easy to interpret, and covers a lot of bases for human immune cells.
The Linnarson lab datasets are the best I've found for mouse brain. All just really depends.
Exactly. The same goes for gene signatures in general. You get vastly different marker genes if you compare for example a monocyte to other blood cells, or monocytes to its immediate progenitors and some macrophage populations. Hence, choose a reference and the subsets of cells that resemble somewhat your setup. Also, it depends on how you define markers. For example, Seurat does the sledgehammer approach and defines markers as genes overexpressed in a cluster versus all other cells, regardless of cluster membership or size of the dataset. This somewhat guarantees to always find some markers, but they're everything but specific. On the other hand, doing all pairwise comparisons between clusters and then enriching for genes always overexpressed (or in a certain percentage of comparisons) will give more focused markers, but can fail if a population is not transcriptionally unique or in a developmentally tight continuum. As Jared says
All just really depends
.Keep in mind, a marker signature does not need to bullet-proof identify a population in a blind fashion. For example, say I have a dataset with lots of diverse immune cells but also stroma (such as endothelium, fibroblasts etc) and you want to find T cells. Then you can first filter on leukocytes (for example CD45+ clusters that do not express canonical myeloid markers such as the transcription factor PU.1 (Spi1 gene), and THEN you can check which lymphoid markers or apply your lymphoid / T cell signatures. Cell assignment does not happen in a vacuum. You can and should apply more biological knowledge that just a handful of markers.
Thank you, ATpoint and Jared. Yes, I am now looking for dataset for mouse immune cells in blood or lymph nodes. Ideally, we want to find immune cells in islets of mice that have type-1 diabetes. But that might be too specific. There are datasets out there, but the annotation information was not so easy to find.
Then ImmGen is the way to go, they have lots of data to compile your markers from at the resolution you want. I would say you don't need a matched reference for exactly the condition you have, it is somewhat unlikely that a perturbation changes all markers so a cell is entirely unidentifiable.