Question

choose between normalization techniques for OTU counts

1

Entering edit mode

6.7 years ago

a1788 ▴ 10

Hello

After having noticed some spurious results in my data set I wanted to contact this expert community here to get help with choosing the right normalization approach for my data.

I have two groups, patients and healthy controls, where microbiota OTUs have been measured from biopsies: Quality filtering was performed using SDM software and default criteria parameter adapted to the 454 sequencing platform using the LotuS pipeline. High-quality and midquality sequences were mapped to count the occurrence of OTUs in a single sample and clustering was done with UPARSE. The OTU sequences were then taxonomically assigned using Greengenes database34 (3.8, August 2013) and RDP II database35 (release version 11).

Now I want to use this data to correlate to host mRNA expression, preferably using Spearmans Ranks.

The default procedure in my lab is to normalize for sequencing depth by calculating ratios, but I think that ratios are not the ideal way to test my hypothesis, so Im looking into more useful alternatives. Also I have quite a number of columns that are either sum-zero or have very low variance, so just calculating ratios might blow up noise overproportiannly.

From all the options out there I think that Deseq2 or TMM, cumulative sum scaling or just subsampling by number of reads (multiplying all of the entries by (#reads in smallest sample)/(#reads in this sample)) would be best.

The thing is that we have a very low number of observations (around 30 per group) give difficulties of obtaining these samples, so im a bit hesitant with Deseq2.

Any input regarding this question would be highly appreciated, thanks in advance!

OTU sequencing • 4.1k views

ADD COMMENT • link updated 5.7 years ago by erwan.scaon ▴ 950 • written 6.7 years ago by a1788 ▴ 10

score 1 · Answer 1 · 2018-03-05

Hi a1788,

I absolutely agree with your opinion on traditional ratio scaling (or rarefaction, your last suggestion). I personally use an library size scaling on the maximum library size with Box-Cox transformation. Note that I don't consider this the best approach, but I suggest reading this paper by Paul McMurdie and Susan Holmes for a great overview. Deseq2 / TMM is certainly better than fraction or rarefaction scaling.

Cheers

score 1 · Answer 2 · 2019-03-21

1

Entering edit mode

5.7 years ago

erwan.scaon ▴ 950

Hi,

You should have a look at GMPR

ADD COMMENT • link 5.7 years ago by erwan.scaon ▴ 950