Question

Outliers Detection with Cook's distance

0

Entering edit mode

8.4 years ago

ahnje770 ▴ 20

I have made a list of differentially expressed genes with DESeq2. However i find that when i look at the normalised count (using rlog) of these top genes, they are commonly affected by outliers or there doesn't seem to be consistent gene expression changes between the 2 groups. My desgin is 4 sample vs 4 sample.

For example, gene "KLHL9"

The normalsied count values for the FIRST group is: 7.3, 10.0 ,10.6 ,10.7

The normalsied count values for the SECOND group is: 10.0 ,12.1, 12.2 ,11.4

So the first sample of "7.3" seems to be an outlier however this gene was not removed. In fact it had a raw count value of "0". The cooks distance value for the same samples are (in the same order)

FIRST group: 4.926845e+08 , 3.566783e+10 , 2.959753e-04 , 5.331939e-03

SECOND group: 2.691049e+10, 1.206045e-03, 4.211501e-04, 1.445035e+10

My understanding is that the higher the cooks value, the higher the chances it is an outlier. However the cooks value is not picking up the outlier here and the cooks value doesn't really correspond to the normalised count value.

Can anyone help me understand this a bit better?

rna-seq DESeq2 • 2.5k views

ADD COMMENT • link 8.4 years ago by ahnje770 ▴ 20