How can I identify and remove outliers samples based on a MDS plot (limma package)?
1
0
Entering edit mode
3.8 years ago

Hi, I downloaded microRNA expression data from TCGA (Isoform Expression Quantification files) using the TCGAbiolinks package from Bioconductor. Now I made a MDS plot using limma package function plotMDS but I have a weird distance between samples of same group (normal tissue group) and called it outliers (I don't even know if it really is), the dots on bottom of the plot. plotMDS red dots represents tumor samples and green dots represents normal tissue samples

  1. How can I find out the sample IDs of these outliers? each dots corresponds to a sample of a DGElist object from which the MDS is made.
  2. Should I remove these samples from my analysis?
r edger limma rna-seq tcga • 3.1k views
ADD COMMENT
2
Entering edit mode
3.8 years ago
seidel 11k

As to your first question - How can I find out the sample IDs of these outliers? R functions sometimes invisibly return objects that you can capture for information simply by assigning the function call to a variable. For instance, you might normally make a histogram as follows: hist(rnorm(100)), but if you assign the call to a variable, you'll get an object with lots of information:

> foo <- hist(rnorm(100))
> names(foo)
[1] "breaks"   "counts"   "density"  "mids"     "xname"    "equidist"

In this case, foo$counts has the count of numbers in each bin of the histogram. By the same token, plotMDS() returns information about the plot which you can use to identify your samples. If you were to call: foo <- plotMDS(), you could get your sample names as follows:

foo <- plotMDS(yourDGElist)
# get the names of those outliers:
names(foo$y[foo$y < -1])

As to whether or not you should exclude them from your analysis, without further inspection or knowing more about your experiment or the goals of your analysis, only you can answer this.

ADD COMMENT

Login before adding your answer.

Traffic: 2732 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6