Confound Alert! Human Capital Formation, the Euros, and GCSE Performance

"Student effort and educational attainment: Using the England football team to identify the education production function" (via) is a recent paper by Robert Metcalfe, Simon Burgess and Steven Proud that looks at a commonsense question. Every two years, either the World Cup or the Euros coincide with portions of the English exam period; does that have a detrimental effect on the test results? Yes, say the authors. Here's the abstract:
We use a sharp, exogenous and repeated change in the value of leisure to identify the impact of student effort on educational achievement. The treatment arises from the partial overlap of the world’s major international football tournaments with the exam period in England. Our data enable a clean difference-in-difference design. Performance is measured using the high-stakes tests that all students take at the end of compulsory schooling. We find a strongly significant effect: the average impact of a fall in effort is 0.12 SDs of student performance, significantly larger for male and disadvantaged students, as high as many educational policies.
They compare the differences in the differences in results between "late" subjects (the exams for which overlap with the tournaments in tournament years) and the "early" subjects (the exams for which do not overlap). Note that the 0.12 SDs result is for all students, including those who don't care about football. How much is 0.12 SDs?
For example, the “Literacy Hour” intervention in England raised reading attainment by 0.06 SDs (Machin & McNally, 2008). A unit SD increase in teacher quality raises test scores by around 0.15 to 0.24 SDs per year, 0.27 in England (Rockoff, 2004; Rivkin, Hanushek and Kain, 2005; Aaronson, Barrow, and Sander, 2007; Kane and Staiger, 2008; Slater, Davies and Burgess, 2011). The effect of major “early years” programmes such as Head Start is 0.147 SDs in applied problems and 0.319 in letter identification (Currie and Thomas, 1995; Ludwig and Phillips, 2007). Crawford et al (2007) have shown that a student’s month of birth has effects on GCSE outcomes: students who have spent their entire school careers as the youngest in the class (Augustborns) score on average 0.116 SDs (girls) or 0.131 SDs (boys) lower than the oldest in the class (September-born students). Substantial effects on pupil progress have been found in “No Excuses” Charter schools, of between 0.10 - 0.40 standard deviations increase per year in mathematics and reading (Abdulkadiroglu et al, 2009; Angrist et al, 2010). More closely related to our focus on effort, Fryer (2010) and Levitt et al (2011a) show that incentivising students to raise their effort (inputbased student incentives) have an effect size of about 0.15 SDs, and Levitt et al (2011b) show that incentives on the day of a test can increase test scores by around 0.2 SDs.
I was a bit knackered when I read the paper, and didn't put all that much effort in it, but even so I am somewhat dismayed that I couldn't find anything wrong with the statistics. I can quibble with the interpretation, though. The authors assume (but do not measure) that the lowered test results are a consequence of students spending less time on preparation. Another possible explanation is that during tournaments, there are more instances of pupils coming to the exams hung over. I don't have a Nobelish-prize winning theory to back it up, but common observation suggests that being hung over has a detrimental effect on test performance, holding everything else constant. That's what really explains the results. Try proving me wrong!

