I've been trying to implement the divide-and-conquer Metacell algorithm in Python, trying to classify cells. I've successfully installed the necessary library and attempted to follow the pipeline outlined here.
However, I'm struggling to understand the purpose and significance of each step in the process. For instance, the first step, 'Exclude', assesses whether any genes in my AnnData object correlate with lateral genes, which, from my understanding, is undesirable.
What should I do with this information? The guidelines aren't clear, and I'm also unsure what qualifies as a lateral gene.
Moreover, I'm confused about the differences between the 'Direct' step and the 'Divide and Conquer' step.
If anyone has experience with this method and can explain how to use it in a straightforward manner, I would greatly appreciate your guidance.
"I'm also unsure what qualifies as a lateral gene."
The README where you reference has an entire section on this under the heading '
lateral_gene
mask':I note that
mc.pl.relate_genes()
gets used in two notebooks in this repository by the same group that contains all the code for reproducing the analysis from the manuscript "Time-Aligned Hourglass Gastrulation Models in Rabbit and Mouse", which was done with the metacells package:2-metacells/mm_metacells.ipynb
2-metacells/oc_metacells.ipynb
Plus, theres a Vignettes repo that "give examples for using the metacells."
Exploring those example notebooks & Vignettes would probably help with a lot of the specific things you mention and your broader question about using this package.
I've followed this vignette
Which is the only pipeline that does not rely on pre knowledge (like in my case). But I don't see they performed the metagroup stage, which is a crucial step if you're familiar with the algorithm. They only repeated the metacell stage several times and stopped there.
Yes, they are very clear about this. The page about the Vignettes says:
Hopefully, the repo with the "all the code for reproducing the analysis from the manuscript "Time-Aligned Hourglass Gastrulation Models in Rabbit and Mouse"" is more informative for your needs.
Otherwise at the bottom it clearly says:
The repo does not provide any valuable information unfortunately. The Metacell algorithm they provide in the vignette is missing the metagroup stage (and they skipped all kinds of stuff to keep it simple), and the repo can only direct me to some other GitHub page with some more Metacell functions, which I do not understand how to use. The explanation of this algorithm in the Metacell 2 paper is clear, while the implementation pipeline is not clear at all. Anyway, thank you for helping !