Hello, I have a python anndata object containing gene counts for a number of barcodes. Because of some technical details of the single cell technology it originates from, each cell receives two barcodes instead of one. Therefore, in order to recapitulate the gene counts for a cell, I need to add up the counts from two barcodes.
I am using the following code:
adata.X = adata.X.tolil()
for first_index, second_index in zip(first_list_indices, second_list_indices):
adata.X[first_index,:] += adata.X[second_index,:]
adata.X = adata.X.tocsr()
With first_list_indices and second_list_indices being a list of the indices of the barcodes that need to be added up. In a subsequent step, I remove the barcodes corresponding to the second_list_indices
to_keep = [True] * len(adata.obs.index)
for second_ind in second_barcode_indices:
to_keep[second_ind] = False
# Remove second barcodes
adata = adata[to_keep,:].copy()
However, my computer runs out of memory when I try to run this code for a big matrix. I am sure there is a much better and efficient way to do this, and I would really appreciate if someone could help me optimise the code to make it use less memory.
Thanks so much!!