I'm trying to find the markers genes that distinguish cell types in a tissue (to use in predicting cell types) but am unable to find a scRNAseq dataset that covers all the cell types present. I have several cell type annotated scRNAseq datasets that each contain a subset of the cell types I expect to see in the dataset I want to predict in but there are significant batch effects between datasets so simply combining them isn't an option and calculating the markers in each dataset separately would only give me the marker genes that can be used to tell apart just the cell types in each individual dataset (I am using scanpy's rank_gene_groups function). How would I go about getting the marker gene set that best separates my full set of cell types? I don't think the batch correction methods I've seen will work as each scRNAseq dataset has a distinct set of cell types with little to no overlap so a MNN approach would likely not work.
Thanks!
Your post appears to be asking two different things 1.) how to find marker genes 2.) how to map or integrate other scRNA-seq data sets
1.)
Cluster your cells and then ask for marker genes that distinguish one cluster from another.
2.)
Seurat integration can address issues with batch effects. https://satijalab.org/seurat/articles/integration_introduction.html