Question

What should I use correlate/compare microbiome compositional data to other data types

0

Entering edit mode

11 months ago

Faith ▴ 50

Hello everyone :)

I'm trying to find a statistical approach or method to accomplish the following:

I have a group of 16sRNA data taken from the same specie but 3 different organisms across 3 years (once each year) along with other physiological metrics and metabolic data.
The organisms each inhabit a different envioronment with different environmental factors (one of the total 3 places is considered normal factors with least anthropogenic effects).

With that said, I'm trying to accomplish two things:

Correlate which variables or data (physiology, metabolic, immunity, etc..) types correlate to the microbiome composition on individual years.
Correlating the microbiome changes on a year vs year or year vs years basis to the changes in other variables or data types (physiology, metabolic, immunity, etc...)

What method or statistical approach can I use to compare or correlate the changes of microbiome composition with other data types, and how to select the variables with most probable influence on the change?

Final question would be, can I use the organism which lives in an environment with the least human interference in its habitat as a control?

R python microbiome compositional-biology • 850 views

ADD COMMENT • link updated 11 months ago by dthorbur ★ 3.0k • written 11 months ago by Faith ▴ 50

score 0 · Answer 1 · 2024-08-19

0

Entering edit mode

11 months ago

dthorbur ★ 3.0k

You might get better responses on a dedicated stats forum like Cross Validate.

It's quite difficult to answer your questions since you don't mention your research question(s) and I'm unsure of data structure. Do you have replicates for any of the 3 samples within a year? There are plenty of relevant statistical tests and metrics you can use: diversity indices, ordination analyses, linear regressions, permanovas, generalised linear mixed effects models, etc...

taken from the same specie but 3 different organisms

What do you mean? Is this a gut microbiome study?

If you have 3 data points per variable per year as your questions suggests, your statistics will be underpowered and you won't get anything robust for Q1. Better for Q2, but still would be difficult to draw any conclusions with only 9 data points. However, if they are all different organisms, and you have no replicates, then any effect you might observe will be nested within organism.

Try finding similar studies in the literature to get an idea of how to analyse this kind of data, tools and resources available, data requirements, and what a relevant control might be.

ADD COMMENT • link 11 months ago by dthorbur ★ 3.0k

1

Entering edit mode

The actual experiment is an NSF grant on 3 islands with 3 different visitation levels that affect diets. We have nearly 200~ 16sRNA samples, and the physiological data (blood count, immune metrics, energy metrics, metabolome profiles ...etc.) are also gathered from multiple hosts (data collected from multiple samples).

The samples are collected from marked hosts once every year (I think in different seasons, I'm not sure if they aligned them to the same season or not).

I'm a master's student. What I'm trying to answer is, does changes in diet attribute to changes in microbiome composition, and which of those data types or variables are more likely to be the most dominant factor affecting that change, what changes happened in the microbiome composition from year to year, how different are they and so on. Other questions would arise within those 2 main questions. How does a more senior bioinformatician approach this?

EDIT: I would also like to add that we have 3 different labels for our islands (1. high visitor rate (high effect on diet), 2. Medium (moderate effect) 3. Low visitor rate (low effect on diet). We have data from each islands (3 sets of data one set for each islands) and this data is acquired 3 times, once per year. Total of 9 sets of data.

Do I compare within the year itself first, see what changes happened across islands and then compare those changes to the other years? In other words, do I just do the analysis within a year by seeing how microbiome changed from island to islands, and how that change correlates with other data types. Then compare those results with the results i get from other years?

ADD REPLY • link 11 months ago by Faith ▴ 50

0

Entering edit mode

Much better level of detail. Thanks.

I would start with within year as you suggested if you have enough data and start with ordination analyses to see if you can see segregation by island in the first ordination axes (e.g., PCoA, NMDS, CCA). You can even use all samples and change shape for year to see if the pattern is consistent. If you do see a pattern, then use a PERMANOVA to statistically validate that observation. Then you can start playing around with regressions to try and find which environmental variables are implicated for each island or year. If you get some results, then move into the time series, but these analyses get trickier and likely need something like linear mixed effects models. You should try to remove a priori assumptions when exploring data, so don't go in trying to spot a pattern you read in the literature. If you see the pattern that is supported in the literature then that's great as it's easier to explain.

Regardless of how you proceed, your supervisor should be your first stop for questions and guidance. If not, a PhD student or postdoc in the lab should be around to help.

ADD REPLY • link 11 months ago by dthorbur ★ 3.0k