Im using the Seurat platform on some data but I think the question applies regardless of platform:
I'm seeing some PC's grouped on cell cycle, and also notice hetergenous cycle stage grouping of cells in the PC plot
If I call vars.to.regress within ScaleData to regress these out, does this only affect the clustering? The between cluster DEG list works on the raw counts right, so regressing out these genes will only cause the cells to NOT cluster based on these genes, and wont affect the DEG analysis once those clusters are set correct?
If I am correct, which other analyses in a general pipeline would regressing out genes affect? I hear it be said that if the feature you are regressing is a biological feature of interest, then you should leave it in.
In my case, cell cycle is a biological feature of interest, but I do not want the cell clustering to be affected by the cell cycle stage. If thats the case, regressing out so they don't cluster on this feature then using the raw data to look at the cell cycle stage after clustering should be fine right? The data could even be re-scaled after clustering if it needs to be with these genes now included, and the cells would retain the cluster IDs right?
Thanks in advance!
Is it fair to regress out batch effect on only highly variable genes only ?
Probably not, as it could change which genes are considered "highly" variable.