Unpacking Karl Smith on Experiments and Regressions (An Introduction to Causality and How to Measure It), Part I

Karl Smith writes:
A random trial is simply a physically controlled regression analysis. There is no fundamental difference between performing a regression on data collected in the field and data generated in the lab. It is simply that in the lab you hope that you have performed all the necessary controls physically rather than statistically.

However, the physical controls can still fail. Notably, double-blind experiments are an attempt in medicine to go beyond simple randomness because simple randomness not enough. Even with double blind, however, results are often not generalizable.
Oh man! Smith jumps from one aspect to the next, only to introuduce a third in the following sentence and veils it all under vague language.

My idea was to write a quick post to say what I think is wrong with this statement. It turned out "quick post" wasn't possible. I'll hence write this post in multiple installments. This here is the first; I guess there'll be at least two more. I'm making no promises regarding the dates of publication of the other ones. I'll try my best, but I'm short on time these weeks. Follow using the Introduction to Causality label.

A useful starting point is to distinguish between internal and external validity. Internal validity refers to the question of whether you've measured what you say you've measured. This includes, but is not limited to, the question whether a correlation you report and claim represents causation actually does represent causation.

External validity refers to the question whether what you have observed in a specific situation (such as a lab experiment) may also be expected to be observed in similar, but different, situations (such as "the real world").

Internal validity (Part A)

As the internal validity question often is discussed as though it referred only to causality, let's note that you can get internal validity wrong without getting causality wrong. Let's say you want to measure the influence of frustration on human aggression. You define the latter, as usual, as behaviour intended to harm another person who is motivated to avoid that harm. You study this question in a psychological lab experiment. You take a bunch of undergraduates and divide them into a treatment group and a control group. The treatment group gets frustrated in some way, the control group does not. Then you have participants in both groups write essays about "an important event from your childhood and how you felt about it." You collect the essays and afterwards have a bunch of raters (who are blind as to which group the authors of the essays belonged to) rate them for the number of expressions of aggression, such as ". . . and every time I think about this, I'd really like to kick my dad's head in." You find that the essays from the treatment group contain significantly more expressions of aggression. Did you demonstrate that frustration causes aggression? No, because expressions of aggression in an essay are not intended to harm another person who is motivated to avoid that harm (or at least it would be hard to make that argument). Your dependent variable does not measure aggression. Hence, internal validity fail.

More often the question of internal validity refers to whether the measure of statistical association you say represents causality actually does. Which raises the question what causality is in the first place. The definition that underlies the whole logic of experimentation and that has widespread acceptance among people that try to develop methods for establishing causality with nonexperimental (observational) data is what philosophers call "the standard interpretation of the conterfactual model of causality:"

X may be said to have caused Y if X and Y co-occur, but Y would not have occured in an otherwise identical situation had X not been present.

This definition immediately lays bare the problem of establishing causality: You cannot make observations in two parrallel worlds, one of which features X and one of which doesn't. Your mission is to approximate this "parallel worlds" ideal as well as possible. That's where this whole business of "control" comes in. When you control for variables other than X, you try to statistically hold constant all variables that may have an influence on Y. In this way, you hope to isolate the effect specifically of X on Y. If you knew that you had controlled for all other relevant variables (measured without error, just like X and Y), you could be certain that any correlation that's left between X and Y represents a causal connection between the two.

There are multiple problems with this technique, however, on which in the next part.


Karl Smith said...

I;m interested to hear the rest of your unpadking, but its important to note that I was making no statement about causality.

My only point is that there is nothing magical about random trails. You can't simply say "we randomly assigned some to control and some to treatment" and then think your job is done.

Nor should you look askance at sophisticated observational techniques simply because they do not have controls.

LemmusLemmus said...

I have no intention of scoffing at sophisticated techniques for observational data, but I do have the intention of pointing out some weaknesses of simple multivariate regressions (in the next installment), which I wouldn't put in that league.

Both controls and randomized experiments are done in an attempt to increase the probability that an observed statistical association is, in fact, causal, so talking about the merits of observational techniques vs. randomized experiments means talking about causality, no? After all, establishing causality is randomized experiments' big claim to fame.

But perhaps we should continue the conversation when my job is done.