Hi, I'm currently analyzing Bulk and Single cell RNA seq data from mouse brains.
On the bulk side I have control vs treated samples, whereas the RNA-seq data is from control mice.
I have performed Differential Expression analysis between the conditions in bulk data, and following this protocol I performed PCA on the single cell data but using only the subset of genes that are significantly differentially expressed in the treated data.
I was not looking for much here, only to try and see some structure in the resulting dimplot.
In the plot above we see some separation in a subset of the Astrocytes in the single cell data.
My first question is: is it valid to run PCA analysis on such a small subset (~15) of genes on the single cell data? I think of this similar to running PCA on cell cycle genes to find out if there is a strong cell cycle effect, and would assume that this is fine, as long as I don't extract any information beyond visualizing that there is some aggregation of astrocytes related to PC2 in this plot.
I continued the analysis by calculating module scores on the single cell data, using the gene set corresponding to PC2.
The module scores in astrocytes are significantly higher (paired wilcoxon test) than in other cell types when computing them with genes corresponding to PC2. This is also true for the full list of genes, although the effect is less clear.
My second question is: is it correct to interpret from this result that the gene set resulting of the differential expression analysis in the bulk data is significantly enriched in astrocytes vs other cell types? Is there anything I might be missing? Am I stretching the interpretation?
Based on this results, we have conducted downstream experiments that verify the impact of the condition on astrocytes, but I want to make sure that the journey towards Astro is sufficiently justified (there is some additional evidence, not relevant to these questions).