Question

Harmony integration group.by.var parameter

0

Entering edit mode

11 months ago

Picasa ▴ 680

Hi,

I am looking for advice about parameter for hamony integration. My question is to look for differences between WT and KO

I have paired data:

For each donor I have 2 samples (WT and KO)

Example of data:

Sample_ID   Donor_ID    Condition
1_WT    1   WT
1_KO    1   KO
2_WT    2   WT
2_KO    2   KO

The donor data were sequenced at different time (ex: Donor 1 at 10 days, Donor 2 at 60 days)

This is my original command for Harmony

RunHarmony(seu_obj, group.by.vars = c("orig.ident"))

I am not sure about group.by.vars parameter.

orig.ident corresponds to the Sample_ID, but should I include Donor_ID or Condition in group.by.vars ?

Thanks

single-cell harmony • 2.3k views

ADD COMMENT • link updated 11 months ago by jared.andrews07 ★ 18k • written 11 months ago by Picasa ▴ 680

score 1 · Answer 1 · 2024-05-24

1

Entering edit mode

11 months ago

jared.andrews07 ★ 18k

The variability explained by the variables provided to group.by.vars is what Harmony will try to remove. Assuming you want to remove the differences between the donors, that's what I'd be feeding to it.

If you're trying to remove variation between all the samples, then yes, "Sample_ID" may be appropriate. In some cases, doing so prior to clustering/cell type annotation (and then re-doing it with the variable you'd actually like removed) can be helpful in order to annotate consistently across samples.

ADD COMMENT • link 11 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

Thanks jared.andrews07 for your answer.

So, you are suggesting to use only "Donor" in the integration?

RunHarmony(seu_obj, group.by.vars = c("Donor"))

Since I am interested in the differences between the conditions (WT vs KO), I am wondering if using "Sample_ID" is appropriate:

RunHarmony(seu_obj, group.by.vars = c("Sample_ID", "Donor"))

I am not sure, but using "Sample_ID" might remove the differences between the conditions right ?

ADD REPLY • link 11 months ago by Picasa ▴ 680

1

Entering edit mode

I am not sure, but using "Sample_ID" might remove the differences between the conditions right ?

More than likely, it'd at least impact them, yes. You can always try both and see which looks better.

Also not sure what you plan to do downstream, as generally integration isn't going to impact differential expression (as you can always include unwanted variables in your model). I generally just use integration to cram identical cell types together between conditions to make them easier to cluster/annotate.

ADD REPLY • link 11 months ago by jared.andrews07 ★ 18k

0

Entering edit mode

For downstream analysis, after integration, my plan is to annotate each cluster/cell type and then perform a DGE analysis of KO versus WT for each cluster/cell type separately.

ADD REPLY • link 11 months ago by Picasa ▴ 680

0

Entering edit mode

jared.andrews07

If I integrate using Donor:

RunHarmony(seu_obj, group.by.vars = c("Donor_ID"))

Should I also normalize my data by splitting it by Donor_ID so that each object in the list contains both samples (WT and KO)?

seu <- SplitObject(seurat_obj, split.by = "Donor_ID")
seu <- lapply(X = seu, 
              FUN = SCTransform, 
              return.only.var.genes = FALSE)