Question

treating out-of-range values

0

Entering edit mode

5.7 years ago

biobiu ▴ 150

Hey, I have some metabolites data. However, in several samples and metabolites the values are out-of-range. The nice things is that the I can distinguish between values below and above the range.

How would you treat these values? some options that I were thinking about

1) Assign these values as NA. I don't like this option because if one group of samples have all values above the range and another group are below the range we can't state that we have no information here...

2) assign the highest in range value for >OOR and the lowest in range value for < OOR. It some kind of conservative solution, but it feel wrong with the inaccuracy.

3) counting number of >OOR in each group and performing categorical test (such as fisher exact test or chi-square). The main cons here that we lose the in-range data and of-course that we make the problem categorical instead of numerical.

Please share with me your thoughts/ideas...

brainstorm • 939 views

ADD COMMENT • link 5.7 years ago by biobiu ▴ 150

1

Entering edit mode

I would go with option 2 until the underlying distribution of values is symmetric. For example, replacing given parts of a sample at the high and low end with the most extreme remaining values is used while calculating the Winsorized mean.

ADD REPLY • link 5.7 years ago by Andrzej Zielezinski 11k