Hi all,
I am having some issues with some RNA-seq data and a potential outlier. Below you will find the MDS plot (plotMDS function in R, edgeR). Sample X9026 has been eliminated based on sequencing quality, but X9001 looks fine regarding sequencing QC.
To me, just by looking at the MDS plot, 9001 is an outlier. I ran the DEG analysis (edgeR), and when I kept 9001 in, there were 100+ DEGs, but when I took the sample out, there were 0, indicating to me that the single sample was driving those differences.
What do y'all think about sample 9001? Does anyone know of some methods that identify outlier samples other than looking at the MDS plot?
Thanks in advance!
What was different about that sample (ans X9001 as well)? Were those processed/sequenced along with the rest?
All samples were processed at the same time!
9026 produced a significantly smaller number of reads compared to the other samples. And when the lab redid the library prep they did not see any significant quality improvement so we decided not to resequence it.
Sequencing wise and QC wise, 9001 is on par with the rest!
Can you show the edgeR code. A single sample driving things, to me, sounds like no prefiltering was done on genes with essentially no counts but few high outliers.
Sure! Here is the beginning part of my code. I did include pre-filtering, but sample 9001 still is an "issue".
I'd use
filterByExpr
as in the user guide.