Outliers Detection with Cook's distance
0
0
Entering edit mode
8.4 years ago
ahnje770 ▴ 20

I have made a list of differentially expressed genes with DESeq2. However i find that when i look at the normalised count (using rlog) of these top genes, they are commonly affected by outliers or there doesn't seem to be consistent gene expression changes between the 2 groups. My desgin is 4 sample vs 4 sample.

For example, gene "KLHL9"

The normalsied count values for the FIRST group is: 7.3, 10.0 ,10.6 ,10.7

The normalsied count values for the SECOND group is: 10.0 ,12.1, 12.2 ,11.4

So the first sample of "7.3" seems to be an outlier however this gene was not removed. In fact it had a raw count value of "0". The cooks distance value for the same samples are (in the same order)

FIRST group: 4.926845e+08 , 3.566783e+10 , 2.959753e-04 , 5.331939e-03

SECOND group: 2.691049e+10, 1.206045e-03, 4.211501e-04, 1.445035e+10

My understanding is that the higher the cooks value, the higher the chances it is an outlier. However the cooks value is not picking up the outlier here and the cooks value doesn't really correspond to the normalised count value.

Can anyone help me understand this a bit better?

rna-seq DESeq2 • 2.5k views
ADD COMMENT

Login before adding your answer.

Traffic: 2719 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6