How to remove outliers using PCA in R?

1

Entering edit mode

5.3 years ago

zhaoliang0302 ▴ 50

Hi,

I detected several outliers among my samples by plotting PCA. But I don't know how to remove this samples PCA plot The outlier samples is marked by the red circle.

Thanks

PCA R • 9.6k views

ADD COMMENT • link 5.3 years ago by zhaoliang0302 ▴ 50

0

Entering edit mode

You should explain how you generated your PCA plot (from which type of data ? ). Put your code. And a minimal reproducible example.

ADD REPLY • link 5.3 years ago by Nicolas Rosewick 11k

0

Entering edit mode

The data is a dataframe of RNAseq FPKM expression file, rows correspond to genes and columns to samples.

library("FactoMineR")
library("factoextra")
pca_data <- as.data.frame(t(RNAseq_data))
pca_data$group <- c(rep('GBM',100),rep('rGBM',100))
pca <- PCA(pca_data[,1:(ncol(pca_data)-1)], graph = F)
fviz_pca_ind(pca,
         geom.ind = "point", 
         col.ind = pca_data$group, 
         palette = c("#00AFBB", "#E7B800"),
         addEllipses = TRUE, 
         legend.title = "Groups"
)

ADD REPLY • link 5.3 years ago by zhaoliang0302 ▴ 50

0

Entering edit mode

My first question with such a plot is, what are these outlier samples? Is there a biological or technical explanation for this?

ADD REPLY • link 5.3 years ago by Benn 8.3k

0

Entering edit mode

I downloaded this RNAseq data and just explore it. Considering the large samples, I think remove these 'outlier' samples is not a risk.

ADD REPLY • link 5.3 years ago by zhaoliang0302 ▴ 50

0

Entering edit mode

Are all samples from the same dataset ? Do you have metadata on this samples (sequencing kit ? type ? cell type ? sequencing plateform, etc...) IMO you see here a clear (non-biological) batch effect

ADD REPLY • link 5.3 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Yes, all tumor samples are from the same dataset. The clinical data doesn't contains batch information. So I want to remove these samples directly.

ADD REPLY • link 5.3 years ago by zhaoliang0302 ▴ 50

2

Entering edit mode

I guess in the pca object you should have PC1 and PC2 (information used to plot). Use these to filter out the samples i.e. PC1 < -100

ADD REPLY • link 5.3 years ago by Nicolas Rosewick 11k

0

Entering edit mode

Thanks, I save this plot as PDF file (large size) and then zoom in to get the outlier samples. It sounds silly but it really works :-)

ADD REPLY • link 5.3 years ago by zhaoliang0302 ▴ 50

0

Entering edit mode

Hi, I am also facing the same issue, and by checking your suggested method I am finding the actual sample which have pc1 < -100 are outlier. Please can you share explanation what is the basis of the threshold selection of -100. It would be much helpful. Thank you.