Hello everyone :)
I'm trying to find a statistical approach or method to accomplish the following:
- I have a group of 16sRNA data taken from the same specie but 3 different organisms across 3 years (once each year) along with other physiological metrics and metabolic data.
- The organisms each inhabit a different envioronment with different environmental factors (one of the total 3 places is considered normal factors with least anthropogenic effects).
With that said, I'm trying to accomplish two things:
- Correlate which variables or data (physiology, metabolic, immunity, etc..) types correlate to the microbiome composition on individual years.
- Correlating the microbiome changes on a year vs year or year vs years basis to the changes in other variables or data types (physiology, metabolic, immunity, etc...)
What method or statistical approach can I use to compare or correlate the changes of microbiome composition with other data types, and how to select the variables with most probable influence on the change?
Final question would be, can I use the organism which lives in an environment with the least human interference in its habitat as a control?
The actual experiment is an NSF grant on 3 islands with 3 different visitation levels that affect diets. We have nearly 200~ 16sRNA samples, and the physiological data (blood count, immune metrics, energy metrics, metabolome profiles ...etc.) are also gathered from multiple hosts (data collected from multiple samples).
The samples are collected from marked hosts once every year (I think in different seasons, I'm not sure if they aligned them to the same season or not).
I'm a master's student. What I'm trying to answer is, does changes in diet attribute to changes in microbiome composition, and which of those data types or variables are more likely to be the most dominant factor affecting that change, what changes happened in the microbiome composition from year to year, how different are they and so on. Other questions would arise within those 2 main questions. How does a more senior bioinformatician approach this?
EDIT: I would also like to add that we have 3 different labels for our islands (1. high visitor rate (high effect on diet), 2. Medium (moderate effect) 3. Low visitor rate (low effect on diet). We have data from each islands (3 sets of data one set for each islands) and this data is acquired 3 times, once per year. Total of 9 sets of data.
Do I compare within the year itself first, see what changes happened across islands and then compare those changes to the other years? In other words, do I just do the analysis within a year by seeing how microbiome changed from island to islands, and how that change correlates with other data types. Then compare those results with the results i get from other years?
Much better level of detail. Thanks.
I would start with within year as you suggested if you have enough data and start with ordination analyses to see if you can see segregation by island in the first ordination axes (e.g., PCoA, NMDS, CCA). You can even use all samples and change shape for year to see if the pattern is consistent. If you do see a pattern, then use a PERMANOVA to statistically validate that observation. Then you can start playing around with regressions to try and find which environmental variables are implicated for each island or year. If you get some results, then move into the time series, but these analyses get trickier and likely need something like linear mixed effects models. You should try to remove a priori assumptions when exploring data, so don't go in trying to spot a pattern you read in the literature. If you see the pattern that is supported in the literature then that's great as it's easier to explain.
Regardless of how you proceed, your supervisor should be your first stop for questions and guidance. If not, a PhD student or postdoc in the lab should be around to help.