Hi everyone, I'm analyzing two single cells samples, each one representing an experimental condition (I will call them condition A and B).
To start, I treated the samples individually and I did a clustering analysis for each one, identifying the different subpopulations.
Then, I was asked to do a differential analysis between the two conditions to find all DEG between A and B, independently from clusters information. For this purpose I integrated the two datasets (IntegrateData
function) and I used FindMarkers
function on the "RNA assay".
Now, I've been asked to find DEG between two specific subpopulations A vs B.
I know that I can do this by integrating (or merging) the two datasets, but there is a way to use clustering information found when treating the two samples individually? Because, when I merge or I integrate the two datasets, the clustering change.
Thanks a lot
Francesca
In general you should integrate the samples if there is a batch effect, do your dimension reduction and clustering on the batch corrected counts, switch back to the log normalized counts, and then do all of your differential expressions from those. There's no need for a separate workflow for the two tasks you are describing.
Yeah I know, I was asked to do in this way. Since, apparently, when integrating the two datasets some subpopulations not largely represented tend to cluster together rather than to be separated.