Question

How to interpret ANCOM-BC results when diff is true but passed_ss is false?

0

Entering edit mode

7 months ago

benkosta • 0

Hi everyone! I'm working with ANCOM-BC to analyze differences in microbial taxa between two groups, and I’m trying to understand the results. For some taxa, the diff column is marked as "true," indicating a difference between groups. However, the passed_ss column is "false" for these same taxa.

What does it mean when a taxon shows diff as true but doesn’t pass the subsampling significance (ss) filter? Does this imply that the detected difference may not be reliable? Any insights into interpreting this combination of results would be appreciated!

Thanks in advance!

enter image description here

differential-abundance • 2.2k views

ADD COMMENT • link updated 6 months ago by andres.firrincieli 3.9k • written 7 months ago by benkosta • 0

score 1 · Answer 1 · 2024-11-08

1

Entering edit mode

7 months ago

andres.firrincieli 3.9k

From the manual (always check the manual first)

passed_ss is TRUE if the taxon passed the sensitivity analysis, i.e., adding different pseudo-counts to 0s would not change the results

The sensitivity analysis is explained in the ANCOM-BC2 tutorial

Sensitivity analysis for the pseudo-count addition: Like other differential abundance analysis methods, ANCOM-BC2 applies a log transformation to the observed counts. However, the presence of zero counts poses a challenge, and researchers often consider adding a pseudo-count before the log transformation. However, it has been shown that the choice of pseudo-count can impact the results and lead to an inflated false positive rate (Costea et al. 2014; Paulson, Bravo, and Pop 2014). To address this issue, we conduct a sensitivity analysis to assess the impact of different pseudo-counts on zero counts for each taxon. This analysis involves adding a series of pseudo-counts (ranging from 0.01 to 0.5 in increments of 0.01) to the zero counts of each taxon. Linear regression models are then performed on the bias-corrected log abundance table using the different pseudo-counts. The sensitivity score for each taxon is calculated as the proportion of times that the p-value exceeds the specified significance level (alpha). If all p-values consistently show significance or nonsignificance across different pseudo-counts and are consistent with the results obtained without adding pseudo-counts to zero counts (using the complete data), then the taxon is considered not sensitive to the pseudo-count addition.

ADD COMMENT • link 7 months ago by andres.firrincieli 3.9k

0

Entering edit mode

I’m relatively new to biostatistics and face some difficulties interpreting my results. I’ve read through the manual and the tutorial on the sensitivity analysis for pseudo-count addition, but I’m still a bit unclear on some of the details.

From what I understand, the sensitivity analysis tests how adding different pseudo-counts (values between 0.01 and 0.5) to zero counts in the data affects the results. The idea is to see if the p-values change significantly when different pseudo-counts are used. However, I’m not sure what to make of this in terms of reliability:

If the p-values fluctuate across the different pseudo-counts, does that mean the taxon’s results could be unreliable? Does this suggest that the results might be prone to false positives, where a taxon is incorrectly identified as differentially abundant? I would really appreciate a clearer explanation of how this sensitivity analysis works and what it means for the validity of my findings.

ADD REPLY • link 7 months ago by benkosta • 0

1

Entering edit mode

If the p-values fluctuate across the different pseudo-counts, does that mean the taxon’s results could be unreliable? Does this suggest that the results might be prone to false positives, where a taxon is incorrectly identified as differentially abundant?

You got it. The taxon with passed_ss = FALSE is likely a false positive.

I would really appreciate a clearer explanation of how this sensitivity analysis works and what it means for the validity of my findings.

In layman's terms, passed_ss = TRUE is used for taxa that are always significant no matter you perform pseudo-count addition (because you are including in the differential analysis the zero counts as psedu-count) or use the complete data (excluding directily the zero counts). edit: This done by ancom-bc2 because taxa with lot of zeros (eg rare taxa) can appear significant potentially leading to inflated q-values

You can find a more complete explanation in the ancom-bc2 manuscript: Strategies implemented in ANCOM-BC2 to handle zeros

ADD REPLY • link 7 months ago by andres.firrincieli 3.9k

0

Entering edit mode

Thank you for the previous answer, it was really helpful!

I have another question regarding ANCOM-BC2. In my analysis results, I came across the variable "passed_ss intercept." Can someone explain what it means when a taxon "passed_ss intercept"? I've noticed that this value can be marked as true or false in different samples. What does this variation mean, and why might it differ across samples?

Any insights would be much appreciated!

Thanks in advance!

ADD REPLY • link 6 months ago by benkosta • 0

1

Entering edit mode

columns with the term intercept can be ignored in ancombc. Have alook at the github page: https://github.com/FrederickHuangLin/ANCOMBC

ADD REPLY • link 6 months ago by andres.firrincieli 3.9k