Question

How to use scanpy for integration of a single multi-dataset aggregated file

0

Entering edit mode

2.7 years ago

samuel.storey ▴ 10

I've used Seurat extensively for my analysis but I'm looking to switch everything to python for website-hosting purposes (it's far too large a dataset for RShiny). In Seurat, I'm able to take my single features, barcode and matrix fileset and split the datasets via metadata tags

eg.

   #add barcode metadata - extract sample ID from barcode by strpsplit on '-' and extracting second element of resulting list (here the indentifier is appended after '-' to the barcode)
datasets <- sapply(strsplit(rownames(sc_obj@meta.data), split = '-'), "[[",2)
# add barcode metadata - supply dataset ID as additional metadata
sc_obj <- AddMetaData(object = sc_obj, metadata = data.frame(datasets = datasets, row.names = rownames(sc_obj@meta.data)))
## it's important that the datasets are listed in the order that they come from the cellranger output
sc_obj@meta.data$datasets = dplyr::recode(sc_obj@meta.data$datasets,
                                       "1"="Wildtype",
                                       "2"="Mutant", "3" = "Wildtype2", "4" = "Mutant2")

Once I have the metadata tag, I'm able to do quality control and then individually Normalize and FindVariableFeatures before using a list of the objects for standard integration.

sc_objQ.list <- SplitObject(sc_objQ, split.by = 'datasets')
for (i in 1:length(sc_objQ.list)) {
  sc_objQ.list[[i]] <- NormalizeData(sc_objQ.list[[i]], verbose = FALSE)
  sc_objQ.list[[i]] <- FindVariableFeatures(sc_objQ.list[[i]], selection.method = "vst",
                                       nfeatures = 2000, verbose = FALSE)
}

Every tutorial I've seen for scanpy requires that you have individual objects which you then integrate together, ie. two non-aggregated datasets.

Is there a way I can achieve the same result with scanpy as I have with seurat? I'm used to R language and not familiar enough with python/scanpy to figure out the same metadata tagging and splitting.

Any help or direction towards a resource would be very helpful, thank you!

scanpy integration scRNAseq • 1.9k views

ADD COMMENT • link 2.7 years ago by samuel.storey ▴ 10

score 1 · Accepted Answer · 2022-10-13

In case anyone was interested in a very simple splitting technique which makes a lot easier to handle, this was the issue I was having just due to my coding illiteracy:

sample = []
for i in adata.obs_names:
    if i[-1] == ("1"):
        sample.append("WT1")
    elif i[-1] == ("2"):
        sample.append("MUT1")
    elif i[-1] == ("2"):
        sample.append("WT2")
    else:
        sample.append("MUT2")

adata.obs["Sample"] = sample
print(adata.obs)