Question

How should I normalize a dataset for non-normally distributed, zero-inflated, and unequal library size?

1

Entering edit mode

3.3 years ago

ssko ▴ 20

Hi everyone!

I have taxonomic abundance count data obtained from shotgun metagenome analysis. My data is not normally distributed and zero-inflated. Each of the samples in my dataset has unequal library size. In this case, how can I normalize my dataset to do all downstream analysis? Let's say, ones I normalize the data, can I use it for all downstream analysis? what i mean is, does each different analysis need a different normalization method?

I am open to any tips/ suggestions and information.

Thank you all!

unequal zero-inflated downstream-analysis library metagenome normalization • 2.2k views

ADD COMMENT • link 3.3 years ago by ssko ▴ 20

1

Entering edit mode

You don't need normalization. You need to accurately model your data. That's it.

ADD REPLY • link 3.3 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

could you explain more how I can model my data?

ADD REPLY • link 3.3 years ago by ssko ▴ 20

0

Entering edit mode

That's can sadly solve only a semester course in regression modelling :( better several years of stats ofc

ADD REPLY • link 3.3 years ago by German.M.Demidov ★ 3.0k

0

Entering edit mode

time to open the stats notes :D thanks anyway!

ADD REPLY • link 3.3 years ago by ssko ▴ 20

score 5 · Accepted Answer · 2022-02-11

You should probably be handling your data as compositional. See for example: Microbiome Datasets Are Compositional: And This Is Not Optional.
The reference on the topic is the book Statistical analysis of compositional data by John Aitchison (and the original article) and a more recent view can be found in Aitchison’s Compositional Data Analysis 40 Years On: A Reappraisal