I want to regress out cell cycle genes and would need normalized data to get the scores first. And then use SCTransform to regress out cell cycle genes.
From what I understand SCTransform has its normalization, scaling, and finding variable features in the same command. So, shouldn't we use the raw data as input for SCTransform while regressing out cell cycle genes instead of normalized data? Otherwise, won't SCTransform normalize the already normalized data (twice?)??
In short, my question is exactly like this one: Seurat CellCycleScoring – confused about the proper order of operations when using SCTransform. I think he has put it in better words than me
Do you have any insights on this?
PS: Normalizing the raw data to get cell cycle scores, and then using the raw data object for SCTransform doesn't make sense, because the raw data (non-normalised) will not have the columns needed for vars.to.regress in SCTransform
Thanks for your prompt reply. Theoretically using raw counts seemed like the right thing to do to me as well. But the raw metadata doesnt have the S/G2M scores columns that are used as vars.to.regress. How to tackle that ?
You simply need to add the new columns to the metadata and re-call SCTransform. Note that
SCTransform.Seurat
calls intoSCTransform.Assay
(https://github.com/satijalab/seurat/blob/1549dcb3075eaeac01c925c4b4bb73c73450fc50/R/preprocessing.R#L3761), which in turn uses the@counts
slot of the apropriate assay (https://github.com/satijalab/seurat/blob/1549dcb3075eaeac01c925c4b4bb73c73450fc50/R/preprocessing.R#L3655), so SCTransform will always attempt to use raw count data, if it's present in the object.Thanks a lot for the clarification :)