How should I normalize a dataset for non-normally distributed, zero-inflated, and unequal library size?
1
1
Entering edit mode
2.9 years ago
ssko ▴ 20

Hi everyone!

I have taxonomic abundance count data obtained from shotgun metagenome analysis. My data is not normally distributed and zero-inflated. Each of the samples in my dataset has unequal library size. In this case, how can I normalize my dataset to do all downstream analysis? Let's say, ones I normalize the data, can I use it for all downstream analysis? what i mean is, does each different analysis need a different normalization method?

I am open to any tips/ suggestions and information.

Thank you all!

unequal zero-inflated downstream-analysis library metagenome normalization • 1.8k views
ADD COMMENT
1
Entering edit mode

You don't need normalization. You need to accurately model your data. That's it.

ADD REPLY
0
Entering edit mode

could you explain more how I can model my data?

ADD REPLY
0
Entering edit mode

That's can sadly solve only a semester course in regression modelling :( better several years of stats ofc

ADD REPLY
0
Entering edit mode

time to open the stats notes :D thanks anyway!

ADD REPLY
5
Entering edit mode
2.9 years ago

You should probably be handling your data as compositional. See for example: Microbiome Datasets Are Compositional: And This Is Not Optional.
The reference on the topic is the book Statistical analysis of compositional data by John Aitchison (and the original article) and a more recent view can be found in Aitchison’s Compositional Data Analysis 40 Years On: A Reappraisal

ADD COMMENT

Login before adding your answer.

Traffic: 1785 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6