Entering edit mode
4.4 years ago
asumani
▴
70
Hi all,
I need to analyze a subset of publicly available data. There are multiple antibody isotypes of B cells in a single cell RNA seq data. I want to subset IgEs(test) and IgMs(control) for differential expression analysis. Now, should I do the normalization before or after subsetting? Does it even matter? Finally, if it matters how does it affect statistical analysis?
Best,
Difficult to answer without more context. Is tis a single experiment or pulled from different sources? Probably you should create a single count matrix for the relevant celltypes and then feed this into an appropriate statistical framework. That would mean normalize after subsetting. It matters for sure, especially when the composition and type of cells are very different in the full experiment.
It is a single experiment. The normalized count matrix from the same experiment is already available. My plan is to subset from this existing count matrix.
Second, I can run separate pipeline for the subset of fastq files and obtain another count matrix. Normalize the subset and do further analysis.
I am confused if subsetting from already normalized matrix would be statistically acceptable. Or, should I preprocess raw data for the subset and then normalize?
Subsetting the existing one is probably ok but then you are limited to statisticql tests then directly use the norm. counts such as the Wilcox test. For finding markers that is probably ok.