Merging multiple samples in Seurat
1
0
Entering edit mode
14 months ago
AFP3 • 0

Hello! I am very recent to snRNAseq , however has recently started using Seurat to process the data I have available. This is more of a clarification that I have understood the tutorials appropriately.

A breakdown of my data: I have snRNAseq data from diseased with mutation, diseased without mutation (total diseased n =23) and healthy (n=16) individuals. After extracting the data, for each patient, I am left with an MTX file (counts_fil), and two TSV files (col_metadata (headers = barcode, Sample_ID, DGE_group, cell_type) and row_metadata (headers =ENSEBML, Gene, Chromosome, Biotype)).

I have loaded this data into seurat using readMTX with cells and features as col_metadata and row_metadata respectively and then converted this into a SeuratObject.

My question is after creating individual seurat objects, should I normalise and process each sample individually and then integrate (per category listed above?), or merge all samples (by group), process, and then integrate, or combine all samples regardless of group, process, and then integrate splitting by group?

My end goal is going to be looking at the cell populations/expressions between these three categories for which I will have to integrate and anchor my data?

Many apologies if this is a simple question and many thanks for any help in advance.

Cheers

scRNA-seq snRNA-seq RNA-seq Seurat • 8.2k views
ADD COMMENT
1
Entering edit mode
14 months ago
LChart 4.7k

My question is after creating individual seurat objects, should I normalise and process each sample individually and then integrate (per category listed above?), or merge all samples (by group), process, and then integrate, or combine all samples regardless of group, process, and then integrate splitting by group?

Normalization occurs on a per-cell level, so that step doesn't really care how you've stacked the cells together.

The second you call the integration pipeline (starting with finding variable genes with awareness of the sample idents) then Seurat will be reverting to the normalized data anyway, so any per-sample processing performed will be overwritten.

With that knowledge, the question here is do you integrate everything together; or integrate within condition and then concatenate.

If you integrate within condition and then concatenate; and see differences between your conditions, you will naturally be skeptical as to whether this is a disease effect or some batch effect.

If you integrate all of the cells together and don't see differences between your conditions, you will naturally wonder whether integration has "over-corrected" a biological effect. (However, as you have quite a few samples from both conditions, I think the risk of this is pretty low).

My suggestion would be integrate everything together, across group.

ADD COMMENT
0
Entering edit mode

Many thanks for your response! So would you recommend merging all the Seurat objects for my normalisation process (treat it as one big dataset), and then integrating all together, splitting into different groups at the UMAP/further analysis stages?

ADD REPLY
0
Entering edit mode

At no point should you split the object. Perform embedding, clustering, annotation on everything; and you can visualize the different groups within the object. You can also examine contrasts (disease vs control) without splitting the object into 2.

I guess the one time you might want to actually split the object apart is after clustering: Y may want to split the object by cell class, and potentially re-cluster and analyze each class separately.

ADD REPLY
1
Entering edit mode

This is a huge source of confusion for me, would greatly appreciate your help.

The Satija tutorial splits prior to running integrations:

https://satijalab.org/seurat/articles/seurat5_integration.html

I have 8 samples (all ipsc neurons--4 conditions x 2 donors). I filter with soupX and scDblFinder and then merge into a single Seurat object. However, under assay(RNA) there are 8 counts layers in the merged object for some reason. Is that normal?

Also, I integrate by donor and by sample separately and compare CCA, Harmony, scVI.

To get that to work, I have to first run JoinLayers on my merged object and then run Split (by donor or sample) like the tutorial above.

But I am worried this is the incorrect approach or that I am messing up the data structure, because my DEG clusters aren't behaving appropriately.

Any thoughts would be greatly appreciated

ADD REPLY
0
Entering edit mode

I assume that the 8 samples are all different 10X lanes? I think you're fine up until the merge -- which then you have to split apart again. As with the OP of this question, you shouldn't ever need to split the object prior to integration, as you should be reading in separate files in the first place! (The tutorial uses an object which is pre-concatenated, but your own data should be coming at least from separate runs).

As a first pass, try without soupX and scDblFinder and use IntegrateData on all 8 objects and see how that looks.

ADD REPLY
0
Entering edit mode

They're from different lanes, yes.

Sorry, just so I understand, you're suggesting I follow the example here: https://satijalab.org/seurat/reference/integratedata

Except instead of splitting a merged object, I should just start with individual samples, apply some basic filtering, and then run normalize/variablefeatures in a for loop followed by IntegrateData?

ADD REPLY
0
Entering edit mode

That's how I do it, at least.

ADD REPLY
0
Entering edit mode

Thank you so much for all your help!

What about for other integration methods though? I can't use individual objects otherwise, right?

Also, what is the issue with splitting and joining the layers? Like why is that inadvisable?

ADD REPLY
0
Entering edit mode

This is true for basically all methods of integration where the pipeline looks like

for all objects:
   preprocess(object)

join_and_integrate(objects)

For Seurat, the IntegrateData combines joining and integrating. For most other methods join_and_integrate(x) === integrate(join(x)).

What was proposed above was something like:

joined <- join(objects)
resplit <- split(objects)
for object in resplit:
   preprocess(object)
join_and_integrate(resplit)

It is not that joining and splitting the objects are ill-advised, merely unnecessary, and adds the risk of splitting the objects along some feature different from how they were initially separated.

ADD REPLY
0
Entering edit mode

did you split by sample or donor ?

ADD REPLY
0
Entering edit mode

I have tried both -- but Donor seems to integrate better as that is my largest confound I am trying to regress out. That's what I was wondering If I should normalize, scale, pca etc. then split by donor, then run integrateLayers

ADD REPLY

Login before adding your answer.

Traffic: 2441 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6