Question

How to interpret "pseudo count" in gene expression data handling context?

5

Entering edit mode

5.1 years ago

n,n ▴ 390

I am reading a paper that has a passage describing the pre-processing of gene expression data before conducting the experiments. The passage states "After conversion to a base-2 logarithm with a pseudo count of 0.125, batch normalization using ComBat was applied".

What exactly is a pseudo count? What I understood initially was that you add 0.125 to every value in your gene expression matrix and then take the logarithm of that to avoid taking the logarithm of 0 (which is not defined). This is based on my intuition though and I would like to know if this is correct and if there are other reasons why pseudo counts are used.

RNA-Seq normalization • 7.3k views

ADD COMMENT • link updated 5.1 years ago by ATpoint 88k • written 5.1 years ago by n,n ▴ 390

score 6 · Accepted Answer · 2020-05-02

6

Entering edit mode

5.1 years ago

dsull ★ 7.6k

Your understanding is correct.

I personally like log2(x+1) because a pseudocount of 1 means you don't have to deal with negative numbers.

ADD COMMENT • link 5.1 years ago by dsull ★ 7.6k