RNA-seq sample outliers on heatmap hierarchical clustering are low concentration samples: exclude outliers or all low concentration samples?
0
0
Entering edit mode
3.1 years ago
9906201a • 0

n=350 RNA-seq samples from a clinical cohort (RNA from whole blood, Illumina hi-seq) have been quantified using Salmon, tximeta, and DESeq2, and the variance stabilised transformed data assessed by PCA, and heatmaps clustered on distance and/or correlation of the 350x30500 matrix (some genes removed as near zero counts across all samples).

The PCA (using DESeq2 plotPCA, so top 500 variant genes) shows 23% and 13% variance is captured on PC1 & PC2, and there is some mapping of the main variables of interest (mortality outcome) to this PCA QC PCA.

But HC on heatmap shows there are 7-8 clear outliers (eg pairwise correlation coefficients are all > 0.9 for most samples, but for 8 they are in 0.8-0.9 range).

Investigating these outliers shows they had lower library concentrations on average Library Concentration in outliers.

My question is would you advise (a) excluding the 8 outliers identified from the hierarchical clustering, or (b) excluding all samples with library concentrations below an arbitrary cutoff, including those that do not appear to be outliers.

From reviewing papers I think it is common practice to exclude samples based on them being outliers on HC or PCA. But might excluding all lower library concentration samples be a more 'honest' approach? Or does excluding lower conc samples that didn't "fail" as judged by their not being outliers not make sense? I think the lower concentration samples are probably completely at random (so ?MCAR if excluded) although they do have slightly higher PC1 scores than average, so excluding them has some risk of bias I think.

Is there some data driven way to decide which approach to take - e.g. looking at scree plots for PCA with different outliers excluded? I haven't been able to find a discussion on this.

RNA-seq outliers hierarchical clustering Quality DESeq2 Control • 783 views
ADD COMMENT

Login before adding your answer.

Traffic: 2590 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6