Question

removing outliers from RNA-seq data

4

Entering edit mode

9.7 years ago

jfertaj ▴ 110

Hi all,

I have a data.frame from a rna-seq experiment, and I would like to remove some outliers. The data is huge with 350 samples and 32291 genes. The data are log2 RPKM values (I did the log2 because I am planning to do WGCNA analysis and the authors recommend to make a log2 transformation of the data).

I am using the PcaHubert function from rrcov package to find outliers, here is the code I am using:

df <- read.table("/path/to/file/rpkm.txt")
dim(df) #32291   352
df <- df[,-c(1,2)] # first 2 columns have accessory data

library(rrcov)
pcaHub <- PcaHubert(t(df))
outliers <- which(pcaHub@flag=='FALSE')

The outliers would be those samples with the flag FALSE after doing the RobustPCA, do you think it is appropriate to remove outliers using this method?

Any comments would be greatly appreciated

Thanks

outliers WGCNA RNA-Seq R • 9.2k views

ADD COMMENT • link updated 22 months ago by Ram 44k • written 9.7 years ago by jfertaj ▴ 110

score 3 · Answer 1 · 2015-06-26

3

Entering edit mode

9.5 years ago

Deepak Tanwar ★ 4.2k

If you are going to use WGCNA package for network analysis, than you would be having the option to remove the outliers(samples). Follow the WGCNA Tutorials.

ADD COMMENT • link 9.5 years ago by Deepak Tanwar ★ 4.2k

score 2 · Answer 2 · 2015-06-26

2

Entering edit mode

9.5 years ago

Manvendra Singh ★ 2.2k

Yes, I think PCA is also a good choice to remove outliers.

you can also hierarchically cluster the samples on spearman's correlation of gene expression. then it would be easy to detect and remove outliers from dendrogram.

ADD COMMENT • link 9.5 years ago by Manvendra Singh ★ 2.2k

0

Entering edit mode

Hello There is this parameter "crit.pca.distances" in function PcaHubert what should be the value for this other than default value. And what is this parameter?

ADD REPLY • link 8.3 years ago by rajeshkumar_vinod ▴ 30