Treatment FAQ

synthetic control how to get a better pre-treatment fit

by Dr. Jaren Gorczany Published 2 years ago Updated 2 years ago

What is the synthetic control method?

The method is based on the observation that, when the units of analysis are a few aggregate units, a combination of comparison units (the “synthetic control”) often does a better job of reproducing characteristics of a treated unit than using a single comparison unit alone.

Are synthetic control methods likely to pursue falsification exercises?

We include this second paper primarily to illustrate that synthetic control methods are increasingly expected to pursue numerous falsification exercises in addition to simply estimating the causal effect itself.

Should synthetic control be held to the same level of skepticism?

In this sense, researchers have pushed others to hold it to the same level of scrutiny and skepticism as they have with other methodologies such as RDD and IV. Authors using synthetic control must do more than merely run the synth command when doing comparative case studies.

How to use synth to match a pre-treatment?

Next the synth syntax. The syntax goes like this: call synth, then call the outcome variable (bmprison), then the variables you want to match on. Notice that you can choose either to match on the entire pre-treatment average, or you can choose particular years. I choose both.

How does synthetic control method work?

The synthetic control method is a statistical method used to evaluate the effect of an intervention in comparative case studies. It involves the construction of a weighted combination of groups used as controls, to which the treatment group is compared.

DiD vs synthetic control?

While DiD estimation assumes that the effects of unobserved confounders are constant over time, the synthetic control method allows for these effects to change over time, by re‐weighting the control group so that it has similar pre‐intervention characteristics to the treated group.

What is a synthetic control experiment?

The synthetic control method is a statistical method to evaluate treatment effect in comparative case studies. It creates a synthetic version of treated units by weighting variables and observations in the control group.

Who invented synthetic control?

4 California's Proposition 99. Abadie and Gardeazabal (2003) developed the synthetic control estimator so as to evaluate the impact that terrorism had on the Basque region in Spain.

What is synthetic control arm?

A Synthetic Control Arm® (SCA®) is a type of external control that is generated using external patient-level data to improve the interpretation of uncontrolled trials.

What is double machine learning?

Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed, but are either too many (high-dimensional) for classical ...

What is causal inference in statistics?

Causal inference refers to an intellectual discipline that considers the assumptions, study designs, and estimation strategies that allow researchers to draw causal conclusions based on data.

How is synthetic control different from matching?

The synthetic control (SC) method is widely used in comparative case studies to adjust for differences in pretreatment characteristics. SC limits extrapolation bias at the potential expense of interpolation bias, whereas traditional matching estimators have the opposite properties.

What is a difference in difference model?

The difference-in-differences method is a quasi-experimental approach that compares the changes in outcomes over time between a population enrolled in a program (the treatment group) and a population that is not (the comparison group). It is a useful tool for data analysis.

What is causal inference in statistics?

Causal inference refers to an intellectual discipline that considers the assumptions, study designs, and estimation strategies that allow researchers to draw causal conclusions based on data.

What is double machine learning?

Double Machine Learning is a method for estimating (heterogeneous) treatment effects when all potential confounders/controls (factors that simultaneously had a direct effect on the treatment decision in the collected data and the observed outcome) are observed, but are either too many (high-dimensional) for classical ...

What is synthetic control?

Synthetic control, as the name suggests, creates a look-alike control population based on which the outcome of the treatment can be measured. In the above example, let’s assume that an entire state of Texas is given the 10% offer on Tuesdays and Thursdays. Here, all other states in the US are potential candidates for the control population. But not every state has a similar pattern of sales like TX, so an optimisation algorithm is run to choose the weights for the states that can match TX store sales from the pre-treatment period. This acts as the new control and is used for measurement in the post-treatment period.

Where is synthetic code available?

The original published implementation of the synthetic code is available mainly in R, Stata and Matlab. There are several (but unregistered) versions in Python as well. Here is a compiled list of code and package sources:

Is there a treatment and control population for A/B testing?

For testing, there has to be a treatment and control population. Since customers visiting stores cannot be segmented into these groups, certain zip codes or states have to be chosen as treatment and control groups for testing. A/B tests are not helpful in this case and here’s why:

Who developed the synthetic control estimator?

Abadie and Gardeazabal ( 2003) developed the synthetic control estimator so as to evaluate the impact that terrorism had on the Basque region in Spain. But Abadie, Diamond, and Hainmueller ( 2010) expound on the method by using a cigarette tax in California called Proposition 99.

What is the second advantage of counterfactual?

A second advantage has to do with processing of the data. The construction of the counterfactual does not require access to the post-treatment outcomes during the design phase of the study, unlike regression. The advantage here is that it helps the researcher avoid “peeking” at the results while specifying the model.

Does synthetic control remove subjective bias?

Abadie, Diamond, and Hainmueller ( 2010) argue that synthetic control removes subjective researcher bias, but it turns out it is somewhat more complicated. The frontier of this method has grown considerably in recent years, along different margins, one of which is via the model-fitting exercise itself. Some new ways of trying to choose more principled models have appeared, particularly when efforts to fit the data with the synthetic control in the pre-treatment period are imperfect. Ferman and Pinto ( 2019) and Powell ( 2017), for instance, propose alternative solutions to this problem. Ferman and Pinto ( 2019) examine the properties of using de-trended data. They find that it can have advantages, and even dominate DD, in terms of bias and variance.

What is Synthetic Control?

Synthetic Control has been described as the “most important development in program evaluation in the last decade” (Atheyand Imbens 2016). The synthetic control method is a statistical method used to evaluate the effect of an intervention in comparative case studies.

Data Used

In 1988, California passed a famous Tobacco Tax and Health Protection Act, which became known as Proposition 99.

Mathematical Notations

Suppose that we have J +1 units. Without loss of generality, assume that unit 1 is the unit that gets affected by an intervention. In our case, California is the case affected by intervention or Proposition 99. Units j =2,…, J +1 is a collection of untreated units or states that we will refer to as the “donor pool”.

Visual Explanation

As we know, linear regression is also a way of getting the prediction as a weighted average of the variables. In this case, a regression can be represented as the following matrix multiplication

Implementation

To estimate the treatment effect with synthetic control, we will try to build a “fake unit” that resembles the treated unit before the intervention period. Then, we will see how this “fake unit” behaves after the intervention. The difference between the synthetic control and the unit that it mimics is the treatment effect.

Conclusion

Using pre-period data from other states, we built a lasso regression model that assigned fixed weights to each control state and arrived at a weighted average that closely resembled California smoking activity before Proposition 99 was introduced.

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1 2 3 4 5 6 7 8 9