Treatment FAQ

why does regression not work with heterogeneous treatment effects

by Rylee Kozey Published 2 years ago Updated 2 years ago

Regressions that control for confounding factors are the workhorse of evaluation research. When treatment effects are heterogeneous, however, the workhorse regression leads to estimated treatment effects that lack behavioral interpretations even when the selection on observables assumption holds.

Full Answer

Does treatment effect heterogeneity exist in regression discontinuity?

Regressions that control for confounding factors are the workhorse of evaluation research. When treatment effects are heterogeneous, however, the workhorse regression leads to estimated treatment effects that lack behavioral interpretations even when the selection on observables assumption holds. Regressions that use propensity scores as weights and regressions based …

Does the workhorse regression work for heterogeneous treatment effects?

Heterogeneous Treatment Effects Kosuke Imai Harvard University STAT 186 / GOV 2002 CAUSAL INFERENCE ... It does not solve the statistical problem Kosuke Imai (Harvard) Heterogeneous …

Why do we need a test for treatment effect heterogeneity?

Jul 20, 2010 · Regressions that control for confounding factors are the workhorse of evaluation research. When treatment effects are heterogeneous, however, the workhorse regression leads …

Can I use a regression to estimate Cates and interaction effects?

The vast and growing literature on heterogeneous treatment effects merits careful study. 19,42, 43 For now, it is important to state that (1) the calculation of the average causal effect of a ...

What is heterogeneous treatment effect?

Heterogeneity of treatment effect (HTE) is the nonrandom, explainable variability in the direction and magnitude of treatment effects for individuals within a population.

How do you test for heterogeneous treatment effects?

To implement the test, first use the experimental data to estimate the average treatment effect (ATE) and the difference in variances Var(Yi(1))−Var(Yi(0)). Next, create a full hypothetical schedule of potential outcomes assuming that the true treatment effect is constant and equal to the estimated ATE.

What is treatment effect in regression?

Treatment effects can be estimated using social experiments, regression models, matching estimators, and instrumental variables. A 'treatment effect' is the average causal effect of a binary (0–1) variable on an outcome variable of scientific or policy interest.

What is homogeneous treatment effect?

A homogeneous treatment effects model. The magnitude and direction of the treatment effect is the same for all patients, regardless of any other patient characteristics. Models that allow the treatment effect to be different for different individuals are referred to as heterogeneous treatment effect models.May 21, 2016

How do you test for heterogeneous?

The classical measure of heterogeneity is Cochran's Q, which is calculated as the weighted sum of squared differences between individual study effects and the pooled effect across studies, with the weights being those used in the pooling method.

What is called heterogeneous?

Definition of heterogeneous

: consisting of dissimilar or diverse ingredients or constituents : mixed an ethnically heterogeneous population.

What is treatment on the treated effect?

2 Effects: ITT (Intent to Treat) = People made eligible for treatment / intervention. TOT (Treatment on the Treated) = People who actually took the. treatment / intervention.

What is average treatment effect on the treated?

The average treatment effect (ATE) is a measure used to compare treatments (or interventions) in randomized experiments, evaluation of policy interventions, and medical trials. The ATE measures the difference in mean (average) outcomes between units assigned to the treatment and units assigned to the control.

What is the difference between ATT and ATE?

ATE is the average treatment effect, and ATT is the average treatment effect on the treated. The ATT is the effect of the treatment actually applied.Oct 25, 2017

What is a large treatment effect?

Effect Size

An estimate of how large the treatment effect is, that is how well the intervention worked in the. experimental group in comparison to the control. group. The larger the effect size, the stronger are the.

What is treatment effect in Anova?

The ANOVA Model. A treatment effect is the difference between the overall, grand mean, and the mean of a cell (treatment level). Error is the difference between a score and a cell (treatment level) mean.

What is heterogeneity systematic review?

Inevitably, studies brought together in a systematic review will differ. Any kind of variability among studies in a systematic review may be termed heterogeneity.

Who wrote the paper Reducing bias in observational studies using subclassification on the propensity score?

Rosenbaum, P. , and D. Rubin. 1984. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association 7: 516 - 24.

What is the workhorse of evaluation research?

Regressions that control for confounding factors are the workhorse of evaluation research. When treatment effects are heterogeneous, however, the workhorse regression leads to estimated treatment effects that lack behavioral interpretations even when the selection on observables assumption holds. Regressions that use propensity scores as weights and regressions based on random coefficients or hierarchical models provide alternative estimators that have clear behavioral interpretations. Assuming selection on the observables and heterogeneous treatment effects, this article (a) shows what is identified as the treatment effect in the workhorse model, (b) shows what is identified as the treatment effect by propensity score models and models based on random coefficients/ hierarchical models, and (c) provides advice for evaluators.

What is treatment effect heterogeneity?

The study of treatment effect heterogeneity is the study of these differences across subjects: For whom are there big effects ? For whom are there small effects? For whom does treatment generate beneficial or adverse effects? Research on such questions can help inform theories about the conditions under which treatments are especially effective or ineffective; it can also help inform ways of designing and deploying policies so as to maximize their effectiveness.

Which is more powerful, Westfall Young or Bonferroni correction?

The Westfall–Young step-down procedure is an alternative FWER control method that can be more powerful than the Bonferroni correction because it takes into account correlations between the tests. 6 The procedure involves the following steps: 7

What is treatment by treatment interaction?

In contrast to treatment-by-covariate interactions, treatment-by-treatment interactions are differences in CATEs where the personal or contextual attribute partitioning subjects into subgroups is experimentally manipulated. Because the covariate is randomly assigned, treatment-by-treatment interactions may be interpreted causally. Factorial and partial factorial designs allow researchers to randomly assign subjects to different combinations of “cross-cutting” treatment conditions and to estimate treatment-by-treatment interactions as allowed by the design.

How to simulate sharp null hypothesis?

Simulate the sharp null hypothesis of no treatment effect by performing a large number L of replications of random assignment of treatment, leaving the outcome and covariate data unchanged .

How to test whether an interaction effect could have occurred by chance?

To test whether the estimated interaction effect could have occurred by chance, one can use randomization inference: First generate a full schedule of potential outcomes under the null hypothesis that the true treatment effect is constant and equal to the estimated ATE. Then simulate random assignment a large number of times and calculate how often the simulated estimate of the interaction effect is at least as large (in absolute value) as the actual estimate.

How to mitigate multiple comparisons?

One way to mitigate the multiple comparisons problem is to reduce the number of tests conducted (e.g., by analyzing a small number of pre-specified subgroups). Another approach is to adjust the p -values to account for the fact that multiple hypotheses are being tested simultaneously.

Can conditioning on a post-treatment covariate lead to bias?

Conditioning on a post-treatment covariate may lead to bias, because biased estimation of both the main effect and the interaction effects is possible when a post-treatment covariate is included as a regressor. This is especially likely when the covariate is affected by the treatment.

How to test for heterogeneity in treatment effect?

To test for treatment effect heterogeneity, we define the hypotheses as (3.9) H 0, a t e h e t e r o: C A T E ( x) = γ, ∀ x ∈ X c and for some γ ∈ R, H 1, a t e h e t e r o : H 0, a t e h e t e r o does not hold. If C A T E ( x) = γ for all x ∈ X c and for some γ ∈ R, then the equality would hold with γ = A T E = ν ( ( 0, 1)). Then H 0, a t e h e t e r o would imply that ν ( ℓ) = p ( ℓ) ⋅ ν ( ( 0, 1)), where p ( ℓ) = E [ g ℓ ( X i) | Z i = c] is the conditional probability of X i ∈ C ℓ. So the hypotheses in (3.9) are equivalent to (3.10) H 0, a t e h e t e r o: ν h e t e r o, a t e ( ℓ) = ν ( ℓ) − ν ( ( 0, 1)) ⋅ p ( ℓ) = 0, ∀ ℓ ∈ L, H 1, a t e h e t e r o: ν h e t e r o, a t e ( ℓ) = ν ( ℓ) − ν ( ( 0, 1)) ⋅ p ( ℓ) ≠ 0, for some ℓ ∈ L. When ℓ = ( 0, 1), ν h e t e r o, a t e ( ℓ) degenerates to zero. For smaller cubes, ν h e t e r o, a t e ( ℓ) examines whether the ATE among individuals with characteristic values belonging to C ℓ is equal to the population ATE multiplied by the proportion of such individuals. The following lemma formally summarizes the equivalence result of the null transformation discussed above.

What is treatment effect heterogeneity?

Treatment effect heterogeneity is frequently studied in regression discontinuity (RD) applications. This paper proposes, under the RD setup, formal tests for treatment effect heterogeneity among individuals with different observed pre-treatment characteristics. The proposed tests study whether a policy treatment (1) is beneficial for at least some subpopulations defined by pre-treatment covariate values, (2) has any impact on at least some subpopulations, and (3) has a heterogeneous impact across subpopulations. The empirical section applies the tests to study the impact of attending a better high school and discovers interesting patterns of treatment effect heterogeneity neglected by previous studies.

What is subsample regression?

The subsample regression method repeats the main RD analysis with different subsamples defined by individual observed characteristics. This method is nonparametric. However, as the method typically involves running a number of subsample RD regressions at the same time, it is essential to adjust the regressions for multiple testing (see, e.g., Romano and Shaikh, 2010 and Anderson, 2008) to achieve correct inference. Unfortunately, none of the papers in our survey using the subsample regression method address this issue. Furthermore, even if multiple testing is correctly accounted for, the subsample RD regression method is not ideal. First, under the fuzzy RD design, it can produce over-rejected tests and under-covered confidence intervals if the sample size or proportion of compliers is small for some subsamples. This is because the method uses subsample local average treatment effect estimators, which could have non-classical inference when the first stage is weak ( Feir et al., 2016) or when the subsample size is small. Second, to implement the subsample regression method, researchers often categorize continuous covariates into discrete groups in an arbitrary way, which often results in loss of information.

Why are proposed tests useful?

The proposed tests are useful because applied researchers are often interested in treatment effect heterogeneity. A survey of recent publications in top general interest journals in economics finds that 15 out of 17 papers that adopted the RD framework analyzed treatment effect heterogeneity. 4 The common practice is to build a linear regression model with interaction terms between the discontinuity dummy and additional controls of interest or to accompany the primary RD regression with subsample regressions.

What is the bandwidth of a pooled regression discontinuity analysis?

Pooled regression discontinuity analysis. Notes: Nonparametric local linear estimations are conducted using a triangular kernel. The bar chart reports the histogram of the standardized running variable, while the circles and lines report the average outcome within each bin and the local linear estimates. The bandwidth is set to 0. 5 for all graphs for the purpose of data illustration and cross-comparison.

How to test if a policy treatment has any impact on at least some subpopulations?

To test if a policy treatment has any impact on at least some subpopulations, the null and alternative hypotheses can be formulated as (3.7) H 0, a t e z e r o: C A T E ( x) = 0, ∀ x ∈ X c, H 1, a t e z e r o: C A T E ( x) ≠ 0, for some x ∈ X c. Similar to the previous subsection, we can transform the hypotheses in (3.7) to (3.8) H 0, a t e z e r o: ν ( ℓ) = 0, ∀ ℓ ∈ L, H 1, a t e z e r o: ν ( ℓ) ≠ 0, for some ℓ ∈ L without loss of information, as is summarized in the following lemma.

Is interaction term parametric or nonparametric?

The interaction term method, which adds interaction terms between the dummy variable indicating whether an individual passes the cut-off value of the running variable and additional covariates of interest to the RD regression model, is parametric. The method severely over-rejects under model misspecification even if researchers only use observations close to the cut-off of the running variable for estimation. This is in sharp contrast to the classic RD regression method which is nonparametric and robust to misspecification under mild kernel, bandwidth, and smoothness conditions of the underlying distribution.

How to estimate treatment effect heterogeneity?

A traditional approach to estimating treatment effect heterogeneity is splitting the sample (e.g., male vs. female), estimating the treatment effects separately for both groups, and testing if the difference in treatment effects is statistically significant. However, there are two problems with this approach:

Why is overfitting and p-hacking less of a concern?

Moreover, if you have only few baseline variables, overfitting and p-hacking are less of a concern because there is only a limited number of ways in which you could reasonably form subgroups.

What are the advantages of generic ML?

Generic ML has three advantages over Honest Causal Forest. First, it is generic (surprise!). This means it works with any machine learning technique and you are not limited to random forest. Second, it does account for the uncertainty that arises from sample splitting (we will talk about what this means later). Third, according to Chernozhukov et al. (2020), the method by Wagner & Athey (2017) does not provide reliable estimates in high-dimensional settings where the number of covariates is much larger than the log of the number of observations.

Is treatment effect heterogeneity important?

Understanding treatment effect heterogeneity is not only intellectually interesting but also of huge practical importance.

Can you use genericML in a setting with fixed effects?

On the plus side, note that while I focused on experiments, you can also use the method on observational data ( here’s an example). You can also conveniently use GenericML in a setting with fixed effects. If you have a binary outcome, you will want to change mlr3 learners from solving regression tasks to classification tasks. To do so, you currently have to modify the source code of GenericML but that may change in the future.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9