Hi everyone!
I am currently dealing with pseudotime trajectory analysis and trying to find out genes that vary along pseudotime axis. I have two objects, one of the counts ~20k cells in state_1 and ~2k cells in state_2, another counts ~2k cells in state_1 and ~1k cells in state_2, the first one possibly being biased by the number of cells.
My code is the following:
test_df <- graph_test(cds, neighbor_graph = "principal_graph", k=500, alternative="two.sided", cores=18)
test_df$gene_name <- row.names(test_df)
test_df <- test_df[test_df$q_value<0.05,]
gene_module_df <- find_gene_modules(cds[test_df$gene_name,])
ggplot(gene_module_df, aes(x=dim_1, y=dim_2, color=module))+
geom_point(size=0.5)+
ggtitle("UMAP for genes")
agg <- aggregate_gene_expression(cds, gene_module_df)
pheatmap(agg,
annotation_col = cds@colData[,c("pseudotime", "sample")],
cluster_cols = FALSE,
breaks=c(-2,0,2),
scale="row",
cluster_rows = T,
show_colnames = FALSE,
show_rownames = T,
use_raster=FALSE)
The number of genes that pass threshold of q_value<0.05 depends on the number of nearest neighbors (k) that I choose. My idea is that k should depend on the cells number to explain differences driven by pseudotime, not by fluctuations. However, it is still difficult to understand which k would be the most reasonable.
One another question is whether I am right to use alternative="two.sided", as I want to see not only genes that are upregulated along the axis?
Any advice would be appreciated! Thank you in advance!