Question

How to transform my proportional (relative abundance) microbiome data before for statistical analysis?

1

Entering edit mode

4.6 years ago

deep771992chanda ▴ 40

Hi- I have analyzed metagenomic (WGS) data with MetaPhlAn pipeline which gives relative abundance (out of 100) data of each taxon. I have two groups of data: control and test. I want to find out the Mean, Standard Error (SE), sample number (N) of the control, and test group. My data is not normally distributed and for that I want it to be log transformed. For that, I have used the following function and transformed my dataset:

mk_logit <- function(x) log(x)

But, as my dataset is zero-inflated, all of the zeros (0) log-transformed into -Inf. When they were used for further mean, SD calculation, most of them are producing NaN and Inf. As a result, I am not getting proper result. Can anyone please give me any solution/suggestion in order to get rid of this problem?

Thanks

R • 3.7k views

ADD COMMENT • link 4.6 years ago by deep771992chanda ▴ 40

3

Entering edit mode

You have what's called compositional data. Compositional data needs specific treatment as detailed in the book Statistical analysis of compositional data by John Aitchison. In short, to be able to use standard methods, one needs to preprocess the the data with the additive log-ratio transformation. Instead of the standard logarithm, you can use a generalized logarithm function such as the inverse hyperbolic sine (asinh in R) to deal with 0s. You may want to read the paper Microbiome Datasets Are Compositional: And This Is Not Optional.

ADD REPLY • link 4.6 years ago by Jean-Karim Heriche 27k

0

Entering edit mode

Thanks a lot, Jean-Karim Heriche for your response. I will take a look into the article.

ADD REPLY • link 4.6 years ago by deep771992chanda ▴ 40