Question

Seeking Advice on Causal Inference for Treatment Effect Prediction (Small Sample, Genomic Covariates)

0

Entering edit mode

4 days ago

oghzzang ▴ 50

Hello,

I’ve been studying causal inference recently, but I’m still unsure how to properly approach my analysis — so I would really appreciate your guidance. I’m working with the following dataset and aim to answer this question:

Goal: For each individual, can we predict whether Treatment A or Treatment B would be more effective?

Dataset Summary: N = 88 patients

Treatment assignment: A or B (binary)

Outcome: binary response (1 = favorable response, 0 = unfavorable)

Covariates:

A binary variable for the presence of a specific gene mutation

A continuous variable for the expression level of a specific gene

Questions Since this is a small dataset (n=88), would it still make sense to split the data into training and test sets, as in conventional supervised learning workflows?

I am considering using causal_forest() from the grf package to estimate individual treatment effects (ITEs).

After estimating the ITEs, is it reasonable to decide:

ITE > 0 => Prefer Treatment A

ITE < 0 => Prefer Treatment B

Is this interpretation valid and commonly used in practice?

I’m aware that with such a small sample size, variance and overfitting could be major issues. If there are any recommendations regarding cross-validation strategies, feature regularization, or alternative models (e.g., T-Learner, S-Learner), I’d love to hear them.

Thank you very much in advance for your help!

causal-inference • 337 views

ADD COMMENT • link updated 1 day ago by Ram 45k • written 4 days ago by oghzzang ▴ 50