Question

DESeq2 on metagenomic gene abundance data, should I normalize for total sample depth?

0

Entering edit mode

3.2 years ago

robert.murphy ▴ 110

I am using DESeq2 on metagenomic abundance data that I have generated by mapping reads back to the assembly they created and then doing the following:

    1000000 * trimmed mean depth / contig length - (75*2) * total sample depth

Trimmed mean here is the mean minus the upper and lower 5% to control for extreme values.

DESeq2 asks for un-normalized data as i believe it used total count of a sample for help control for differing depths across samples?

Therefore should I remove the normalization by total sample depth?

Or would I be better just putting the trimmed mean depth straight into DESeq2?

Deseq2 metagenome R • 1.1k views

ADD COMMENT • link 3.2 years ago by robert.murphy ▴ 110

score 0 · Answer 1 · 2022-02-09

0

Entering edit mode

3.2 years ago

Istvan Albert 102k

The statistical method will normalize the data and assumes that you imputed original counts.

Thus, in my opinion, pre-normalizing will likely make results less reliable as it violates the assumptions that the method relies on.

ADD COMMENT • link 3.2 years ago by Istvan Albert 102k

0

Entering edit mode

To add, given it is metagenomic data the abundances end up being very small so I have scaled by 1000 on all of them before turning the values into integers and passing to DESeq:

Would it be safe to assume if one pre normalizes then DESeq2 normalization should result in little to no change?

Given:

Thus, in my opinion, pre-normalizing will likely make results less reliable as it violates the assumptions that the method relies on.

Would you therefore suggest just inputting trimmed mean depth taken as an integer for pseudo count input data?

ADD REPLY • link 3.2 years ago by robert.murphy ▴ 110