The FindMarkers function from Seurat reports different fold change values then the FoldChange function. What is FindMarkers doing that changes the fold change values? I have not been able to replicate the output of FindMarkers using any other means. I have tested this using the pbmc_small dataset from Seurat.
"1. Fold Changes Calculated by \"FindMarkers\" using data slot:"
-3.168049 -1.963117 -1.799813 -4.060496 -2.559521 -1.564393
"2. Fold Changes Calculated by \"FindMarkers\" using counts slot:"
-6.163575 -3.392279 -5.444437 -3.022295 -3.163746 -2.458198
"3. Fold Changes Calculated by \"FoldChange\":"
-6.843398 -3.533748 -2.335603 -6.333289 -2.783531 -2.939866
"4. Fold Changes Calculated from \"AverageExpression\" using data slot:"
-6.533607 -3.514828 -2.269075 -5.548581 -3.033346 -3.287716
"5. Fold Changes Calculated from \"AverageExpression\" using counts slot:"
-6.843398 -3.533748 -2.335603 -6.333289 -2.783531 -2.939866
"6. Direct Calculation using data slot:"
-6.533607 -3.514828 -2.269075 -5.548581 -3.033346 -3.287716
"7. Direct Calculation using counts slot:"
-6.843398 -3.533748 -2.335603 -6.333289 -2.783531 -2.939866
As you can see AverageExpression and FoldChange are consistent, as is my direct calculation from the Seurat object. However log fold changes calculated by FindMarkers are not consistent with any of these regardless of whether fold change is calculated from the raw counts in the "counts" slot or the normalized counts in the "data" slot. Here is the code I used to generate the above output:
library(Seurat)
library(dplyr)
# using pbmc small dataset
# log-normalize the dataset
norm_data <- NormalizeData(pbmc_small,normalization.method = 'LogNormalize')
# 1. Compare A vs. B using FindMarkers using the data slot
fm_result_counts <- FindMarkers(
norm_data,
slot = 'data',
ident.1 = 'A',
ident.2 = 'B',
group.by = 'letter.idents',
test.use = 'wilcox',
min.pct=0,
min.cells.feature = 0,
min.cells.group = 0
)
top5_markers_counts <- head(fm_result_counts)
top5_find_markers_counts_fc <- top5_markers_counts$avg_log2FC
# 2. FindMarkers using counts slot
fm_result <- FindMarkers(
norm_data,
slot = 'counts',
ident.1 = 'A',
ident.2 = 'B',
group.by = 'letter.idents',
test.use = 'wilcox',
min.pct=0,
min.cells.feature = 0,
min.cells.group = 0
)
top5_markers <- head(fm_result)
top5_find_markers_fc <- top5_markers$avg_log2FC
# 3. Fold Change calculated by FoldChange
fc_result <- FoldChange(
norm_data,
ident.1 = 'A',
ident.2 = 'B',
group.by = 'letter.idents',
)
genes <- rownames(top5_markers)
top5_markers_fc <- fc_result[genes,]
# 4. Calculating Fold Change using AverageExpression
ae <- AverageExpression(
norm_data,
slot='data',
group.by = 'letter.idents'
)
ae_data <- ae$RNA[genes,] %>% as.data.frame()
top5_ae_fc <- log2(ae_data$A/ae_data$B)
# 5. Calculating Fold Change using AverageExpression applied to the counts slot
ae_counts <- AverageExpression(
norm_data,
slot='counts',
group.by = 'letter.idents'
)
ae_data_counts <- ae_counts$RNA[genes,] %>% as.data.frame()
top5_ae_counts_fc <- log2(ae_data_counts$A/ae_data_counts$B)
# 6. Direct calculation from object using data slot
A_obj <- subset(norm_data,letter.idents=='A')
B_obj <- subset(norm_data,letter.idents=='B')
data_fc <- log2(rowMeans(expm1(A_obj@assays$RNA@data[genes,]))) -
log2(rowMeans(expm1(B_obj@assays$RNA@data[genes,])))
# 7. Direct calculation from object using counts slot
counts_fc <- log2(rowMeans(A_obj@assays$RNA@counts[genes,])) -
log2(rowMeans(B_obj@assays$RNA@counts[genes,]))
print('1. Fold Changes Calculated by "FindMarkers" using data slot:')
print(top5_find_markers_fc)
print('2. Fold Changes Calculated by "FindMarkers" using counts slot:')
print(top5_find_markers_counts_fc)
print('3. Fold Changes Calculated by "FoldChange":')
print(top5_markers_fc$log2_fold_change)
print('4. Fold Changes Calculated from "AverageExpression" using data slot:')
print(top5_ae_fc)
print('5. Fold Changes Calculated from "AverageExpression" using counts slot:')
print(top5_ae_counts_fc)
print('6. Direct Calculation using data slot:')
print(as.vector(data_fc))
print('7. Direct Calculation using counts slot:')
print(as.vector(counts_fc))