I have recently finished an online scRNAseq course. I was a complete beginner in the field and I really enjoyed the course and have learnt a lot. Now that I have an overview of single cell, I have a flood of maybe dumb questions that escaped me during the course.
I imagine the general scRNAseq workflow like this :
- QC + normalization
- Dimensionality Reduction (PCA, t-SNE or UMAP)
- Clustering
- (Optional, if several dataset) Data integration
- (Optional if you want to study cells behavior over time) Trajectory Inference
- Differential Expression analysis.
Correct me if I am wrong.
Among all the question, my biggest one is the following : Why clustering exactly ? Maybe most of our practical were using datasets were every cell was already attributed to a cell type. So while trying different methods and parameters for clustering, we kind of already had the truth with the cells labeling. Is clustering for helping you attributing your cells to a cell type ? Do we try to make the best cluster parameters that mimic the cells type we already know ? What if you work on "non model" specie for single cell ? With no cell atlas and no possibility for Data Integration ? (the most worked on this field I imagine are mice and humans).
Then, just to be sure : Do we cluster the cells on all the values in the reduced dimensionality ? Or on just a few ? We often use only two in our plots for obvious reasons. But Seurat objects are really tricky to approach for a newbie I think, and it's difficult to explore and see what is what.
An other thing that bothers me is about using clusters for DEA. We were warned about how flowed p-values were when doing DE on clusters. Because we already cluster cells on their expression profiles, so of course when comparing two cluster we are going to see "significant" p-values. So I understand we should not say of very low p-value when comparing two clusters that it is very significant, but then how do you know what is significant ? Do you also use Fold Change for example ?
Sorry if I was unclear, I think my thoughts are not organized well on the topic yet :) Thank you for your input.