Hi guys, I have a "theoretical" question about scRNA-Seq data analysis. I have a dataset composed by 5 samples from peripheral blood (PB) of patients (5 patients) with tumor and 8 samples from primary tumor (8 patients). A subset of primary tumors are matched with PB but not all. I have to analyze them. First I performed a merge of all PB and tumors without any batch correction simply to look at data (primary inspection). Looking at these data, experimentalists identified three compartments of cells: stromal, immune cells and epithelial (based on markers). Cells in these compartments cluster based on sample of origin. They asked me to perform batch correction taking the compartment as batch ID. Now, to me it is not correct. The procedure should be: batch correction (considering each single sample as a batch or adjusting for PB/primary), then identification of compartments using markers on batch corrected data. This is supported by the literature. Am I wrong or there is at least one possibility they are right?
Thank you in advance
Is batch effect different in single cell RNA vs bulk RNA? In bulk RNA, we account for differences in sequencing locations/dates (basically whatever changes the library prep process could introduce). Is this different for scRNAseq? You seem to be describing biological diifferences rather than technical ones.
So basically it is biological heterogeneity that as far as I know is a source of variability that should be corrected as the technical noise. For the bulk is a little bit different especially because the dependency of the variance from the mean is not so strong.