22/05/2013

Instrumental Variables: The Pina Colada Explanation

OLS can tell us about correlations in data, but won't be able to say much about causality. IV regressions turn observational data into a pseudo-experiment.

For example, suppose that we had data on courses taken and earnings, as in the study you cite above. Suppose simple OLS told us that people who take math courses earn more. That is fine as far as it goes, but we really want to know if it is just that smarter people take math courses, or that the math course itself increases earnings. OLS isn't going to help, but a good IV will.

Suppose that due to a bad pineapple, the pina colada mix at a mathematics department picnic in one high school was poisoned. Half of the mathematics faculty was laid up in the hospital for a semester. Many planned math courses were not offered that year. Presumably, the only way the pina colada disaster affected the future earnings of students is through the courses they were able to take. The exogenous variation in mathematics courses available creates the equivalent of an intent-to-treat experiment. Even the smart students were less likely to take math courses during the pina colada year.

Now, as in any statistical procedure, garbage in, garbage out. If the IV is invalid or weak, then the result of the IV regression is totally meaningless.
That's David Jinkins's comment on Bryan Caplan's post on instrumental variables. I don't think much of the post itself, but some of the reader contributions are quite good.

An aspect I would like to see mentioned more often in this context is that the impression conveyed by many articles - that 2SLS is the same as OLS, but with added causality - is quite simply false. Rather, if everything goes right, you are measuring the effect of the variance in the endogenous regressor as influenced by the variation in the instrument. This can be an advantage in some cases, but typically, you want to know about the target variable "as is", and you don't get that.

No comments: