Question

what is the difference between FindVariableFeatures and FindAllMarkers?

0

Entering edit mode

22 months ago

Assa Yeroslaviz ★ 1.9k

I am not sure, I understand the difference between these two functions, or more precise the idea behind the procedures of them.

When running the FindVariableFeatures I try to identify those genes, which show a high cell-to-cell variation. Later in the analysis I also run the FindAllMarkers function, which defines the clusters by calculating the differential expression between the clusters.

I have assumed, that those possible gene markers for each of the clusters would overlap with the genes identified in the first step of identifying the variable features. As this should be the idea behind it. Genes with a high variation between cell groups should be also specific for a certain cluster and therefore be found in both.

But in my data, when comparing the two lists I have some markers identified for certain clusters, which are not in the list of HVG genes. I have found this out, when calculating the DoHeatMap on the top features from my list of marker genes. This throws an error, telling me that some of my genes in the top marker genes' list were not found in the scale.data slot of my seurat object. This slot os calculated on the genes identified as HVG (2000 per default, if not stated otherwise).

What do I miss here? Why are these two lists not completely overlapping? How can it be, that I have significant gene markers for a specific cluster, which were not identified as highly-variable gene?

thanks for clarifying this.

Assa

Seurat single-cell FindVariableFeatures FindAllMarkers RNA-Seq • 2.1k views

ADD COMMENT • link updated 22 months ago by rpolicastro 13k • written 22 months ago by Assa Yeroslaviz ★ 1.9k

score 0 · Answer 1 · 2023-06-01

There isn't an expectation for the lists to match up completely. You could for instance have a gene with low absolute variance across the dataset due to low expression, but whose expression is specific to a particular cell type. This wouldn't show up in the most variable features, but nonetheless is still a marker gene for a population. For the inverse case you could have variable genes that are meaningless on the scale of cell type identity. For instance, mitochondrial and ribosomal genes tend to be variable across the whole dataset, but often the variability won't be associated with any one cell type.