I am trying to use Seurat for Sing-cell sequencing data analysis learning. Being told by somewhere else that if I use set.seed I will make my project repeatable. Then I use set.seed(42) before running seurat.
But recently I found if I use different number for set.seed, I will have different clusers for the same resolution of the object.
For example, if I use set.seed(42), I got 27 clusters. But I would have only 26 if I use set.seed(0) or set.seed(2020).
I am quite confused about it. Please give me some advice!
Is this behaviour reproducible? The clustering algorithm should (by best knowledge) not have a random element (others may correct me) so the cluster number should stay the same. UMAP may look different yes, but this has nothing to do with the numbers of clusters.
UMAP is a stochastic algorithm –
it makes use of randomness both to speed up approximation steps, and to aid in solving hard optimization problems.
This means that different runs of UMAP can produce different results.
By setting the seed to a specific number (e.g. set.seed(42)), you are basically returning the same value instead of a random one.
Keep in mind that, by setting a seed, you are affecting the behaviour not only of Seurat, but of anything that uses pseudorandom numbers.
Is this behaviour reproducible? The clustering algorithm should (by best knowledge) not have a random element (others may correct me) so the cluster number should stay the same. UMAP may look different yes, but this has nothing to do with the numbers of clusters.