I'm reading the SingleR book, and notice that SingleR returns de.genes in the output, which are differentially expressed genes for each label. But then in Chapter 4, to perform diagnostic of the cell type assignment, the author uses findMarkers function to get "empirical" marker genes. I wonder what's the difference between the genes in de.genes and genes from findMarkers?
Thank you for the comment, and I just went through chapter2 and chapter 3 again, and indeed I learned a lot. And I wonder, just like what the author did in chapter4, what is the reason behind getting the intersection between the genes from pairwise Wilcoxon ranked sum test and the genes from
findMarkers
function using t-test to compare all other labels? To me, it seems like finding a intersection of genes set between two methods. I'm confused why he did that, and is it reasonable?Aaron's always got a reason for doing what he does. In this case, it's to show only the reference markers that are also differentially expressed in the test dataset so that the diagnostic plot is more obvious. This makes for a smaller and nicer looking heatmap that ignores genes that don't change significantly between the cell type of interest and other cells in the test data. This is particularly helpful with using the "classic" mode, as it tends to return lots of marker genes.
I wonder what will SingleR do if some of the marker genes detected from reference set does not exist in the test dataset, will SingleR just drop the genes and use only those genes exists in the test dataset to calculate spearman's correlation coefficient?
Yes, SingleR limits to genes common to both the reference and test datasets prior to calculating the correlations.
Thank you so much!