I'm running cNMF on some counts data and I'm not sure I'm running it properly. I asked about my issue in the github but haven't had any response. I am generally following along with the guide in the github but I'm coming across a specific issue:
This is the code for setting up the run:
numiter=100 # Number of NMF replicates
numhvgenes=2000 ## Number of over-dispersed genes to use for running the actual factorizations
## Results will be saved to [output_directory]/[run_name]
output_directory = 'counts/T2_W5/cNMF_out'
if not os.path.exists(output_directory):
os.mkdir(output_directory)
run_name = 'T2_W5_cNMF'
## Specify the Ks to use as a space separated list in this case "5 6 7 8 9 10 11 12"
K = ' '.join([str(i) for i in range(5,13)])
seed = 12345
## Path to the filtered counts dataset we output previously
countfn = 'counts/T2_W5/counts.h5ad'
# Initialize the cnmf object that will be used to run analyses
cnmf_obj = cNMF(output_dir=output_directory, name=run_name)
## Prepare the data, I.e. subset to 2000 high-variance genes, and variance normalize
cnmf_obj.prepare(counts_fn=countfn, components=np.arange(5,13), n_iter=numiter, seed=seed, num_highvar_genes=numhvgenes)
cnmf_obj.factorize(worker_i=0, total_workers=16)
cnmf_obj.combine(skip_missing_files=True)
My first question is when I am running: cnmf_obj.factorize(worker_i=0, total_workers=16)
am I specifying the arguments correctly to use 16 cores? The output looks like this:
[Worker 0]. Starting task 0.
[Worker 0]. Starting task 16.
[Worker 0]. Starting task 32.
[Worker 0]. Starting task 48.
[Worker 0]. Starting task 64.
[Worker 0]. Starting task 80.
[Worker 0]. Starting task 96.
...
My second question is when I combine the results cnmf_obj.combine(skip_missing_files=True)
I get notifications about missing spectra, is this normal/expected behaviour? Why are there missing factorisations?
Combining factorizations for k=5.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_1.df.npz. Skipping.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_2.df.npz. Skipping.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_3.df.npz. Skipping.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_4.df.npz. Skipping.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_5.df.npz. Skipping.
Missing file: counts/T2_W5/cNMF_out/T2_W5_cNMF/cnmf_tmp/T2_W5_cNMF.spectra.k_5.iter_6.df.npz. Skipping.
...