Uneven Biological Replicates for 16S rRNA metabarcoding analysis: Is it justifiable for me to pursue this data?
1
1
Entering edit mode
3.0 years ago
barmonicin ▴ 20

Hello to all and I hope you are all doing well. I am a biology student who's doing a study on bird gut microbiota analysis thorugh 16S rRNA. I collected 3 biological samples per time point (we have five different time points) and I want to see the changes in bird's gut microbiota across those five time points. However, one of the time points only has two biological replicates as the other birds did not have gut contents for me to extract. I have extra birds per time point in case anything happens on the original biological replicates. However, for this particular time point, all the extra bird's and one of the replicate's gut were empty so I'm stuck with only two workable biological replicates for this time point. Now I am having a concern since most microbiome studies use a minimum of biological triplicates and I've read in many statistical reviews that higher biological replicates is necessary to reliably infer the average or variation in a microbial population. I'm wondering if there's a way for me to statistically justify this discrepancy in biological replicates or a way for me to proceed the analysis of this one, or is there any feasible way for me to analyze samples with uneven biological replicates? Thank you so much and apologies if I'm using statistical terms wrong as I am not completely versed in the field. Thanks again and regards.

Analysis statistics replicates microbiome 16S • 1.0k views
ADD COMMENT
4
Entering edit mode
3.0 years ago
Steven Lakin ★ 1.8k

The good news is that this is a common problem, and if you have no way of obtaining additional data, I don't think you'd be faulted for just analyzing what you have. Review may be difficult if you're looking to publish formally, but you should still analyze it and explain what happened.

Simple methods like t-tests can still be applied for comparison between n=3 and n=2 groups. However, there are more correct ways of handling microbiome missing data and timeseries analysis than doing pairwise comparisons. The bad news is that these methods are generally high-level and require some programming knowledge to apply. An example would be this publication, in which they discuss the problem you're having with missing data. The software they use can be found here.

In general, microbiome data (and 16S in particular) are very noisy, and it often takes very large samples sizes to make biologically-relevant inferences. It is great that you have done timeseries though, as it will improve your statistical power by a lot. Typically, there is high variability between individuals' microbiomes, but collecting data over time lets you model the variability within each individual's microbiome as well as between them. Probably the best example of a high-powered microbiome time-series study is the HMP2 publication, but even with a very large sample size, they weren't able to say much biologically speaking. So don't be disappointed if the analysis doesn't find much (my first 16S study didn't); it is still valuable to analyze and share it, since it contributes to our global database of 16S data.

ADD COMMENT
1
Entering edit mode

Thank you Steven for your response. This is probably the most comforting response I've ever gotten. To make things worse on my situation though, I was told just this past weekend that all the replicates of our earliest time point may not be used for analysis as it was taken from a different part of the bird's gut because apparently, the supposedly timepoint with only two replicates doesn't have intestinal contents as well and so they decided to extract somewhere around the crop-proventiculus-gizzard area instead (a different research staff collected the sample as part of our on-going project as I was previously assigned to do other task for the project). I believe that the earliest time point is arguably the most crucial one, especially that I would need to have a basis of how the changes in microbiome happened from early days post hatching to birds' adulthood. With the earliest time point data not available, I honestly don't know how to even analyze this at all. The next time point after the earliest one would be around four weeks from the earliest time point.

I'm still finding ways to provide a feasible analysis for this, even trying to look the available data in a different angle to perform a different analysis. This part of the study will be credited as part of my thesis and the lab supervisor gave me the liberty to do whatever with the data to comply with my academic requirements but I feel like the only relevance that I know of collecting these samples across time points is to track the changes in microbiome. The good part is, all the other remaining samples have at least three to five biological replicates available so I can probably narrow it down to 3 biological replicates for all samples evenly or keep the replicates uneven once I figured out how to analyze them with all the informative references you shared (thanks a lot).

I've been exploring this blindly because I don't have active or in-depth training in bioinformatics and statistics, or have ever done this kind of analysis so thank you so much for taking time in providing your expert inputs and comforting remarks. I sincerely and deeply appreciate it! .

ADD REPLY

Login before adding your answer.

Traffic: 2412 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6