Question

What does it mean of de.genes in SingleR output? What is the difference between the genes in de.genes and genes in findMarkers?

0

Entering edit mode

3.6 years ago

FantasticAI ▴ 60

I'm reading the SingleR book, and notice that SingleR returns de.genes in the output, which are differentially expressed genes for each label. But then in Chapter 4, to perform diagnostic of the cell type assignment, the author uses findMarkers function to get "empirical" marker genes. I wonder what's the difference between the genes in de.genes and genes from findMarkers?

SingleR scRNAseq • 2.3k views

ADD COMMENT • link 3.6 years ago by FantasticAI ▴ 60

score 3 · Accepted Answer · 2021-05-16

3

Entering edit mode

3.6 years ago

jared.andrews07 ★ 18k

I'd recommend closely reading chapters 2 and 3 as well. In short, the classic mode of SingleR defines marker genes by their pairwise fold change between labels, and the number returned depends on how many labels are included in the reference dataset. See chapter 3 for other variance-aware ways to define marker genes.

The findMarkers function uses a t-test by default to define the markers for each label that are up in comparison to all other labels, not on a pairwise basis. This is why those lists will be quite different.

ADD COMMENT • link 3.6 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thank you for the comment, and I just went through chapter2 and chapter 3 again, and indeed I learned a lot. And I wonder, just like what the author did in chapter4, what is the reason behind getting the intersection between the genes from pairwise Wilcoxon ranked sum test and the genes from findMarkers function using t-test to compare all other labels? To me, it seems like finding a intersection of genes set between two methods. I'm confused why he did that, and is it reasonable?

ADD REPLY • link 3.6 years ago by FantasticAI ▴ 60

1

Entering edit mode

Aaron's always got a reason for doing what he does. In this case, it's to show only the reference markers that are also differentially expressed in the test dataset so that the diagnostic plot is more obvious. This makes for a smaller and nicer looking heatmap that ignores genes that don't change significantly between the cell type of interest and other cells in the test data. This is particularly helpful with using the "classic" mode, as it tends to return lots of marker genes.

ADD REPLY • link 3.6 years ago by jared.andrews07 ★ 18k

0

Entering edit mode

I wonder what will SingleR do if some of the marker genes detected from reference set does not exist in the test dataset, will SingleR just drop the genes and use only those genes exists in the test dataset to calculate spearman's correlation coefficient?

ADD REPLY • link 3.6 years ago by FantasticAI ▴ 60

0

Entering edit mode

Yes, SingleR limits to genes common to both the reference and test datasets prior to calculating the correlations.