Using bulk RNA-seq DE results to perform PCA in single cell RNA-seq
3
0
Entering edit mode
2 days ago
Oli • 0

Hi, I'm currently analyzing Bulk and Single cell RNA seq data from mouse brains.

On the bulk side I have control vs treated samples, whereas the RNA-seq data is from control mice.

I have performed Differential Expression analysis between the conditions in bulk data, and following this protocol I performed PCA on the single cell data but using only the subset of genes that are significantly differentially expressed in the treated data.

I was not looking for much here, only to try and see some structure in the resulting dimplot.

enter image description here

In the plot above we see some separation in a subset of the Astrocytes in the single cell data.

My first question is: is it valid to run PCA analysis on such a small subset (~15) of genes on the single cell data? I think of this similar to running PCA on cell cycle genes to find out if there is a strong cell cycle effect, and would assume that this is fine, as long as I don't extract any information beyond visualizing that there is some aggregation of astrocytes related to PC2 in this plot.

I continued the analysis by calculating module scores on the single cell data, using the gene set corresponding to PC2.

enter image description here

The module scores in astrocytes are significantly higher (paired wilcoxon test) than in other cell types when computing them with genes corresponding to PC2. This is also true for the full list of genes, although the effect is less clear.

My second question is: is it correct to interpret from this result that the gene set resulting of the differential expression analysis in the bulk data is significantly enriched in astrocytes vs other cell types? Is there anything I might be missing? Am I stretching the interpretation?

Based on this results, we have conducted downstream experiments that verify the impact of the condition on astrocytes, but I want to make sure that the journey towards Astro is sufficiently justified (there is some additional evidence, not relevant to these questions).

Single-cell scRNAseq • 252 views
ADD COMMENT
0
Entering edit mode
1 day ago

My first question is: is it valid to run PCA analysis on such a small subset (~15) of genes on the single cell data?

I mean, sure, it's valid. Whether the results are meaningful is a different question.

is it correct to interpret from this result that the gene set resulting of the differential expression analysis in the bulk data is significantly enriched in astrocytes vs other cell types? Is there anything I might be missing? Am I stretching the interpretation?

I think you're double dipping, cherry picking, or data snooping by doing it this way.

Why not just calculate the module scores directly with your 15 genes for each cell type and run an ANOVA on the results or whatnot? The additional subsetting of those 15 genes (presumably by just grabbing the loadings for PC2?) is just cherrypicking for stuff you know is going to be "enriched" based on your PCA.

So no, I don't think you can conclude that the gene set identified from your bulk analysis is significantly enriched in astrocytes versus other cell types based on what you've shown...as you're picked the genes used for the module scores based on that very fact.

This is such a small geneset that it'd be worth making plots for each gene and also slapping together a heatmap or dotplot to show collective enrichment of the signature (or not).

ADD COMMENT
0
Entering edit mode
1 day ago
Oli • 0

Hi,

Thank you very much for the thoughtful reply. The double dipping was exactly what I was worried about with the approach. I will calculate the module scores directly with my genes and run ANOVA on the results.

The heatmap is of course also a great idea, and I know some of the genes are good Astrocyte markers.

I'm still curious about whether to present the PCA as a potentially meaningful result or not, though because it does show clear aggregation on Astrocytes.

27% of Astrocytes are above 0 in PC2, while only 10% of the rest of the cells are there. Additionally, gene set enrichment analysis has already pointed us towards Astrocytes.

Thank you again for your efforts in replying.

ADD COMMENT
0
Entering edit mode
1 day ago

That PCA plot looks so wonky, I would not say it supports anything. People typically use T-SNE or UMAP to make 2d visualizations of cell clusters, not just PCA.

ADD COMMENT

Login before adding your answer.

Traffic: 1233 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6