Question

Super long time when running RSoptSC package

0

Entering edit mode

14 months ago

alwayshope ▴ 40

Dear guys,

Anyone encounter the issue that it takes very long time (several days) to run commands in RSoptSC; ClusterCells run several days and still not finished, and already used doParallel package trying to run the command in parallel.

Thank you very much for your guidance!

RSoptSC single-cell • 807 views

ADD COMMENT • link 14 months ago by alwayshope ▴ 40

1

Entering edit mode

Show as much of your code as you can. With time intensive functions, always run the function on a subset of the data to evaluate how long things take as well as ensuring your dataset fits the function's expectations - you don't want to find out a week from now that your dataset is missing a column that the function needs, and you'll need to re-run the entire thing because you forgot a 2 minute pre-processing step.

ADD REPLY • link 14 months ago by Ram 44k

0

Entering edit mode

Thanks a lot!

It's the same data structure as the tutorial of the input, dgCMatrix, and the input is near 20,000 genes x 15,000 cells. filtered_data may run 10h, S <- SimilarityM() and RepresentationMap() can take 1-2 h to run, while the ClusterCells() can take more than 3 days and still not finish.

library(RSoptSC)
logdata <- log10(input_matrix_sc + 1)
gene_expression_threshold <- 0.03
n_features <- 3000
filtered_data<- SelectData(logdata, gene_expression_threshold, n_features)

S <- SimilarityM(lambda = 0.05, 
                 data = filtered_data$M_variable,
                 dims = 3,
                 pre_embed_method = 'tsne',
                 perplexity = 20, 
                 pca_center = TRUE, 
                 pca_scale = TRUE)


low_dim_mapping <- RepresentationMap(similarity_matrix = S$W,
                                     flat_embedding_method = 'tsne',
                                     join_components = TRUE,
                                     perplexity = 35,
                                     theta = 0.5,
                                     normalize = FALSE,
                                     pca = TRUE,
                                     pca_center = TRUE,
                                     pca_scale = TRUE,
                                     dims = 2,
                                     initial_dims = 2)

clusters <- ClusterCells(similarityMatrix = S$W, n_comp = 15, .options='p')
H <- clusters$H
labels <- clusters$labels
n_clusters <- length(unique(clusters$labels))

ADD REPLY • link updated 14 months ago by Ram 44k • written 14 months ago by alwayshope ▴ 40

1

Entering edit mode

You should email the authors and point them to this post. That might help get a solution faster