Question

Removal of samples with weird log-CPM distribution or voomWithQualityWeights?

1

Entering edit mode

5 months ago

t.fortunato.asquini ▴ 10

Hello everyone,

I am doing bulk RNA-sequencing analysis on human brain samples from ~160 donors. I am mostly of following the workflow described here with edgeR/limma. One of the first steps is removing lowly expressed genes, doing TMM normalisation and plotting the log-CPM distribution. As you can see, some samples have many of the retained genes with zero or near-zero expression:

log-CPM distribution after removal of lowly expressed genes and TMM normalization

The next step was outlier detection; and I used both hierarchical clustering and PCA for a visual inspection of the dataset. Around ~10 samples cluster away from the rest of the dataset both with hierarchical clustering and PCA (where the first PC is mostly driven by RIN). The samples labelled in the PCA plot are those that form their own cluster in the correlation heatmap.

Hierarchical clustering of samples based on log-CPM values PCA of samples based on log-CPM values

Most of the 'outlier' samples (8 out of 11), despite having low RIN and clustering separately from the bulk of the dataset, have smooth log-CPM distributions. Instead of removing these samples altogether, I would rather use voomWithQualityWeights to account for the low RNA quality. The remaining outliers (3 out of 11) are those with really different log-CPM distributions (NND_91-IHK, NND_53-YFQ, NND_28-XPU). They still cluster away from most samples in the heatmap/PCA, but this is driven by poor sequencing quality rather than RIN. Indeed, their sequencing yield (in Mb) is very low compared to the rest of the dataset.

log-CPM distribution of subset of outlier samples

Very long premise for a very short question: should I still include these samples in the analysis with voomWithQualityWeights or should I remove them because the composition of their transcriptome is too different due to technical reasons?

Thanks in advance for your help!

voomWithQualityWeights limma PCA outliers • 194 views

ADD COMMENT • link 5 months ago by t.fortunato.asquini ▴ 10