I've used Seurat extensively for my analysis but I'm looking to switch everything to python for website-hosting purposes (it's far too large a dataset for RShiny). In Seurat, I'm able to take my single features, barcode and matrix fileset and split the datasets via metadata tags
eg.
#add barcode metadata - extract sample ID from barcode by strpsplit on '-' and extracting second element of resulting list (here the indentifier is appended after '-' to the barcode)
datasets <- sapply(strsplit(rownames(sc_obj@meta.data), split = '-'), "[[",2)
# add barcode metadata - supply dataset ID as additional metadata
sc_obj <- AddMetaData(object = sc_obj, metadata = data.frame(datasets = datasets, row.names = rownames(sc_obj@meta.data)))
## it's important that the datasets are listed in the order that they come from the cellranger output
sc_obj@meta.data$datasets = dplyr::recode(sc_obj@meta.data$datasets,
"1"="Wildtype",
"2"="Mutant", "3" = "Wildtype2", "4" = "Mutant2")
Once I have the metadata tag, I'm able to do quality control and then individually Normalize and FindVariableFeatures before using a list of the objects for standard integration.
sc_objQ.list <- SplitObject(sc_objQ, split.by = 'datasets')
for (i in 1:length(sc_objQ.list)) {
sc_objQ.list[[i]] <- NormalizeData(sc_objQ.list[[i]], verbose = FALSE)
sc_objQ.list[[i]] <- FindVariableFeatures(sc_objQ.list[[i]], selection.method = "vst",
nfeatures = 2000, verbose = FALSE)
}
Every tutorial I've seen for scanpy requires that you have individual objects which you then integrate together, ie. two non-aggregated datasets.
Is there a way I can achieve the same result with scanpy as I have with seurat? I'm used to R language and not familiar enough with python/scanpy to figure out the same metadata tagging and splitting.
Any help or direction towards a resource would be very helpful, thank you!