Hello! I am very recent to snRNAseq , however has recently started using Seurat to process the data I have available. This is more of a clarification that I have understood the tutorials appropriately.
A breakdown of my data: I have snRNAseq data from diseased with mutation, diseased without mutation (total diseased n =23) and healthy (n=16) individuals. After extracting the data, for each patient, I am left with an MTX file (counts_fil), and two TSV files (col_metadata (headers = barcode, Sample_ID, DGE_group, cell_type) and row_metadata (headers =ENSEBML, Gene, Chromosome, Biotype)).
I have loaded this data into seurat using readMTX with cells and features as col_metadata and row_metadata respectively and then converted this into a SeuratObject.
My question is after creating individual seurat objects, should I normalise and process each sample individually and then integrate (per category listed above?), or merge all samples (by group), process, and then integrate, or combine all samples regardless of group, process, and then integrate splitting by group?
My end goal is going to be looking at the cell populations/expressions between these three categories for which I will have to integrate and anchor my data?
Many apologies if this is a simple question and many thanks for any help in advance.
Cheers
Many thanks for your response! So would you recommend merging all the Seurat objects for my normalisation process (treat it as one big dataset), and then integrating all together, splitting into different groups at the UMAP/further analysis stages?
At no point should you split the object. Perform embedding, clustering, annotation on everything; and you can visualize the different groups within the object. You can also examine contrasts (disease vs control) without splitting the object into 2.
I guess the one time you might want to actually split the object apart is after clustering: Y may want to split the object by cell class, and potentially re-cluster and analyze each class separately.
This is a huge source of confusion for me, would greatly appreciate your help.
The Satija tutorial splits prior to running integrations:
https://satijalab.org/seurat/articles/seurat5_integration.html
I have 8 samples (all ipsc neurons--4 conditions x 2 donors). I filter with soupX and scDblFinder and then merge into a single Seurat object. However, under assay(RNA) there are 8 counts layers in the merged object for some reason. Is that normal?
Also, I integrate by donor and by sample separately and compare CCA, Harmony, scVI.
To get that to work, I have to first run JoinLayers on my merged object and then run Split (by donor or sample) like the tutorial above.
But I am worried this is the incorrect approach or that I am messing up the data structure, because my DEG clusters aren't behaving appropriately.
Any thoughts would be greatly appreciated
I assume that the 8 samples are all different 10X lanes? I think you're fine up until the merge -- which then you have to split apart again. As with the OP of this question, you shouldn't ever need to split the object prior to integration, as you should be reading in separate files in the first place! (The tutorial uses an object which is pre-concatenated, but your own data should be coming at least from separate runs).
As a first pass, try without soupX and scDblFinder and use
IntegrateData
on all 8 objects and see how that looks.They're from different lanes, yes.
Sorry, just so I understand, you're suggesting I follow the example here: https://satijalab.org/seurat/reference/integratedata
Except instead of splitting a merged object, I should just start with individual samples, apply some basic filtering, and then run normalize/variablefeatures in a for loop followed by IntegrateData?
That's how I do it, at least.
Thank you so much for all your help!
What about for other integration methods though? I can't use individual objects otherwise, right?
Also, what is the issue with splitting and joining the layers? Like why is that inadvisable?
This is true for basically all methods of integration where the pipeline looks like
For Seurat, the
IntegrateData
combines joining and integrating. For most other methodsjoin_and_integrate(x) === integrate(join(x))
.What was proposed above was something like:
It is not that joining and splitting the objects are ill-advised, merely unnecessary, and adds the risk of splitting the objects along some feature different from how they were initially separated.
did you split by sample or donor ?
I have tried both -- but Donor seems to integrate better as that is my largest confound I am trying to regress out. That's what I was wondering If I should normalize, scale, pca etc. then split by donor, then run integrateLayers