Question

Uneven Biological Replicates for 16S rRNA metabarcoding analysis: Is it justifiable for me to pursue this data?

1

Entering edit mode

3.6 years ago

barmonicin ▴ 20

Hello to all and I hope you are all doing well. I am a biology student who's doing a study on bird gut microbiota analysis thorugh 16S rRNA. I collected 3 biological samples per time point (we have five different time points) and I want to see the changes in bird's gut microbiota across those five time points. However, one of the time points only has two biological replicates as the other birds did not have gut contents for me to extract. I have extra birds per time point in case anything happens on the original biological replicates. However, for this particular time point, all the extra bird's and one of the replicate's gut were empty so I'm stuck with only two workable biological replicates for this time point. Now I am having a concern since most microbiome studies use a minimum of biological triplicates and I've read in many statistical reviews that higher biological replicates is necessary to reliably infer the average or variation in a microbial population. I'm wondering if there's a way for me to statistically justify this discrepancy in biological replicates or a way for me to proceed the analysis of this one, or is there any feasible way for me to analyze samples with uneven biological replicates? Thank you so much and apologies if I'm using statistical terms wrong as I am not completely versed in the field. Thanks again and regards.

Analysis statistics replicates microbiome 16S • 1.3k views

ADD COMMENT • link 3.6 years ago by barmonicin ▴ 20

score 4 · Accepted Answer · 2021-11-12

The good news is that this is a common problem, and if you have no way of obtaining additional data, I don't think you'd be faulted for just analyzing what you have. Review may be difficult if you're looking to publish formally, but you should still analyze it and explain what happened.

Simple methods like t-tests can still be applied for comparison between n=3 and n=2 groups. However, there are more correct ways of handling microbiome missing data and timeseries analysis than doing pairwise comparisons. The bad news is that these methods are generally high-level and require some programming knowledge to apply. An example would be this publication, in which they discuss the problem you're having with missing data. The software they use can be found here.

In general, microbiome data (and 16S in particular) are very noisy, and it often takes very large samples sizes to make biologically-relevant inferences. It is great that you have done timeseries though, as it will improve your statistical power by a lot. Typically, there is high variability between individuals' microbiomes, but collecting data over time lets you model the variability within each individual's microbiome as well as between them. Probably the best example of a high-powered microbiome time-series study is the HMP2 publication, but even with a very large sample size, they weren't able to say much biologically speaking. So don't be disappointed if the analysis doesn't find much (my first 16S study didn't); it is still valuable to analyze and share it, since it contributes to our global database of 16S data.