Question

Using bulk RNA-seq DE results to perform PCA in single cell RNA-seq

0

Entering edit mode

4 months ago

Oli • 0

Hi, I'm currently analyzing Bulk and Single cell RNA seq data from mouse brains.

On the bulk side I have control vs treated samples, whereas the RNA-seq data is from control mice.

I have performed Differential Expression analysis between the conditions in bulk data, and following this protocol I performed PCA on the single cell data but using only the subset of genes that are significantly differentially expressed in the treated data.

I was not looking for much here, only to try and see some structure in the resulting dimplot.

enter image description here

In the plot above we see some separation in a subset of the Astrocytes in the single cell data.

My first question is: is it valid to run PCA analysis on such a small subset (~15) of genes on the single cell data? I think of this similar to running PCA on cell cycle genes to find out if there is a strong cell cycle effect, and would assume that this is fine, as long as I don't extract any information beyond visualizing that there is some aggregation of astrocytes related to PC2 in this plot.

I continued the analysis by calculating module scores on the single cell data, using the gene set corresponding to PC2.

enter image description here

The module scores in astrocytes are significantly higher (paired wilcoxon test) than in other cell types when computing them with genes corresponding to PC2. This is also true for the full list of genes, although the effect is less clear.

My second question is: is it correct to interpret from this result that the gene set resulting of the differential expression analysis in the bulk data is significantly enriched in astrocytes vs other cell types? Is there anything I might be missing? Am I stretching the interpretation?

Based on this results, we have conducted downstream experiments that verify the impact of the condition on astrocytes, but I want to make sure that the journey towards Astro is sufficiently justified (there is some additional evidence, not relevant to these questions).

Single-cell scRNAseq • 890 views

ADD COMMENT • link updated 4 months ago by jared.andrews07 ★ 19k • written 4 months ago by Oli • 0

score 2 · Accepted Answer · 2025-03-21

My first question is: is it valid to run PCA analysis on such a small subset (~15) of genes on the single cell data?

I mean, sure, it's valid. Whether the results are meaningful is a different question.

is it correct to interpret from this result that the gene set resulting of the differential expression analysis in the bulk data is significantly enriched in astrocytes vs other cell types? Is there anything I might be missing? Am I stretching the interpretation?

I think you're double dipping, cherry picking, or data snooping by doing it this way.

Why not just calculate the module scores directly with your 15 genes for each cell type and run an ANOVA on the results or whatnot? The additional subsetting of those 15 genes (presumably by just grabbing the loadings for PC2?) is just cherrypicking for stuff you know is going to be "enriched" based on your PCA.

So no, I don't think you can conclude that the gene set identified from your bulk analysis is significantly enriched in astrocytes versus other cell types based on what you've shown...as you're picking the genes used for the module scores based on that very fact.

This is such a small geneset that it'd be worth making plots for each gene and also slapping together a heatmap or dotplot to show collective enrichment of the signature (or not).

score 2 · Accepted Answer · 2025-03-22

2

Entering edit mode

4 months ago

swbarnes2 15k

That PCA plot looks so wonky, I would not say it supports anything. People typically use T-SNE or UMAP to make 2d visualizations of cell clusters, not just PCA.

ADD COMMENT • link 4 months ago by swbarnes2 15k