I'd like to recommend a new tool called mLLMCelltype that can greatly simplify the cluster annotation process for single-cell RNA-seq data.
mLLMCelltype is a cell type annotation framework based on large language models (LLMs) that leverages the collective intelligence of multiple LLMs (such as Claude 3.7, GPT-4o, Gemini 2.5 Pro, etc.) to provide accurate cell type annotations without requiring you to manually analyze differentially expressed genes for each cluster.
Why mLLMCelltype Solves Your Problems
Automated Annotation Process: You don't need to manually analyze DEGs for each cluster; mLLMCelltype handles this process automatically.
Multi-model Consensus Mechanism: By leveraging the collective intelligence of multiple LLMs, it reduces biases and hallucinations from single models, improving annotation accuracy.
Transparent Uncertainty Quantification: Provides quantitative metrics (Consensus Proportion and Shannon Entropy) to help identify ambiguous cell populations that may require expert review.
No Reference Dataset Required: Works without pre-training or reference data, directly annotating based on differentially expressed genes.
Complete Reasoning Chains: Documents the full deliberation process for transparent decision-making.
Seamless Integration with Seurat: Works directly with your existing Seurat workflows.
Usage Example
library(mLLMCelltype)
library(Seurat)
library(dplyr)
pbmc_markers <- FindAllMarkers(pbmc,
only.pos = TRUE,
min.pct = 0.25,
logfc.threshold = 0.25)
cache_dir <- "./mllmcelltype_cache"
dir.create(cache_dir, showWarnings = FALSE, recursive = TRUE)
consensus_results <- interactive_consensus_annotation(
input = pbmc_markers,
tissue_name = "human PBMC",
models = c(
"claude-3-7-sonnet-20250219",
"gpt-4o",
"gemini-2.5-pro"
),
api_keys = list(
anthropic = Sys.getenv("ANTHROPIC_API_KEY"),
openai = Sys.getenv("OPENAI_API_KEY"),
gemini = Sys.getenv("GOOGLE_API_KEY")
),
top_gene_count = 10,
controversy_threshold = 1.0,
entropy_threshold = 1.0,
cache_dir = cache_dir
)
cluster_to_celltype_map <- consensus_results$final_annotations
cell_types <- as.character(Idents(pbmc))
for (cluster_id in names(cluster_to_celltype_map)) {
cell_types[cell_types == cluster_id] <- cluster_to_celltype_map[[cluster_id]]
}
pbmc$mLLM_cell_type <- cell_types
DimPlot(pbmc, group.by = "mLLM_cell_type", label = TRUE) +
ggtitle("mLLMCelltype Consensus Annotations")
Regarding Your UMAP Integration Questions
Regarding your UMAP integration questions, mLLMCelltype can help you perform deeper analysis after annotation:
You can use DimPlot(pbmc, group.by = "mLLM_cell_type", split.by = "condition")
to view cell type distributions across different conditions.
Use table(pbmc$mLLM_cell_type, pbmc$sample)
or table(pbmc$mLLM_cell_type, pbmc$condition)
to quantify the number of each cell type across different samples or conditions.
mLLMCelltype's uncertainty quantification features can help you identify cell populations that might differ between batches or conditions.
Resources
I hope this tool helps solve your single-cell annotation challenges!
You need to do through these tutorials which will help you a lot.
Single-cell best practices
OSCA
I'd really recommend finding a local scRNA-seq expert to talk to at your institution if available. These questions are really beyond the scope of this site and will require lengthy and detailed answers.
Are you using Seurat?
Thanks, I'm using seurat and also tried to learn from Bioconductor book. could you help explain me. I am just very confused and did not progress at all
Regards,
I asked that question because it's relevant to your post. I cannot guide you on such a broad topic. Use the links bk11 has provided you to learn more.