economics, psychology, policy: List of 19 Natural Experiments

Many researchers consider randomized controlled trials (RCTs) to be the gold standard methodology in the social sciences. Figure 1, taken from Test, Learn Adapt (2010), shows how RCTs work. Starting with a group of people, randomly divide them into a control group and a treatment group. Perform some intervention on the treatment group (e.g. have them attend a ‘back to work’ program), then see how the outcomes differ. Let’s say 70% of the treatment group find a job compared to 40% of the control. The randomization is the key element which allows us to causally interpret the results. If the two groups were perfectly randomized, and if nobody dropped out of the study, then we can assume the differences in the results are not due to differences between the groups because they should be more or less identical in terms of age, gender, income, education and so on. Since the ‘back to work’ program is the only important difference between the groups, we conclude that it caused the 30 percentage point improvement in employment outcomes.

RCTs are well and good (although not without their critics; see Liam’s discussion on this), and can provide strong evidence of a causal effect, but they are usually expensive and time-consuming to run. For reasons of ethics and feasability, they are also limited in the kind of questions they can ask. Suppose you want to know whether being exposed to pollution stunts a child’s cognitive development. A university ethics board is unlikely to allow you to put some children into a treatment group, give them a hearty dose of pollution and then see what happens. Similarly, if you want to know how strong institutions affect long-run economic growth, you cannot assign good institutions to one country, bad institutions to a neighbouring one and then wait 300 years to watch it all unfold.

Natural experiments can provide answers to these kind of questions. Natural experiments arise when comparable individuals or groups of people are sorted by “nature” into something like a control and treatment group. They differ from RCTs because they are not consciously designed by a researcher. An example of an ongoing natural experiment is the effect of the different systems of government in North and South Korea on their economic growth. The image on the right, taken in 2012, shows the result; the South is a network of activity and the North is covered in darkness.

The ability of natural experiments to provide causal estimates for complex topics makes them very popular across many different disciplines. They are also fun to read about because they typically rely on a clever bit of insight by the researcher(s) to connect some random event in the world with their research question. For this reason, and hopefully to provoke some inspiration, I put together the below list of natural experiments. This list is mostly drawn from the economics literature and is in no way comprehensive, representative or a ranking. For all these examples, remember that the key assumption is that there are comparable groups, and one group is randomly affected by forces outside their control (“nature”). If this assumption holds, we can causally interpret the results. All these papers typically include many tests in order to convince the reader that their data meet these conditions, but there are always interesting possible objections to think of. I encourage readers to post in the comments any perceived flaws in the studies listed here.

Lastly, for additional reading material I recommend this history of field experiments in economics by Levitt and List (2009), this discussion of instrumental variables and natural experiments by Angrist and Krueger (2001), this overview of regression discontinuity designs in social science by Lee & Lemieux (2014) and these examples of RDD porn, and this overview of natural experiments from a public health perspective by the Medical Research Council (2012).

SELECTED LIST OF NATURAL EXPERIMENTS

1. The Oregon Health Insurance Experiment

The Oregon Health Insurance Experiment, discussed recently on this Freakonomics podcast, is one of the most well known natural experiments in recent years in public health. It is the first study to use a randomized design to examine the impact of access to Medicaid, the American social health care program for low-income individuals. This was not a field experiment because, interestingly, the researchers themselves did not create the randomization process. Instead, “[i]n early 2008, Oregon opened a waiting list for its Medicaid program for low-income adults that had previously been closed to new enrollment. Approximately 90,000 people signed up for the available 10,000 openings. The state drew names from this waiting list by lottery to fill the openings.” (source). Essentially, this was an RCT in the wild.

By comparing the outcomes of 29,589 people who eventually received access through the lottery against 28,816 people who applied but did not win, the researchers involved have so far published four papers examining the impact of Medicaid access for health related outcomes such as self-reported health, health care utilization and medical debt (2012, QJE), clinical outcomes such as the prevalence of hypertension and cholesterol (2013, NEJM), emergency department use (2014, Science), and labor market outcomes such as employment and earnings (2014, AER). The results of the first two years, reviewed here, were that access to Medicaid:

(i) increased the use of health-care services: more hospitalizations, emergency-department visits, outpatient visits, prescription-drug use, and preventive-care use.

(ii) decreased financial strain: fewer medical debts, lower likelihood of borrowing money or skipping other bill payments to cover medical expenses, and a virtual elimination catastrophic out-of pocket medical expenditures.

(iii) improved self-reported health and reduced rates of depression, but had no statistically significant effect on physical health outcomes.

(iv) had no statistically significant effect on employment or earnings.

2. Angrist (1990), Lifetime Earnings and the Vietnam Era Draft Lottery: Evidence from Social Security Administrative Records, AER.

MIT Professor Joshua Angrist, subject of this 2013 profile and 2014 EconTalk interview, is one of the most noted proponents of natural experiments in economics. Angrist is also co-author of the book ‘Mastering Metrics’, which describes the most common methods used to analyse natural experimental data: instrumental variables, regression discontinuity designs, and difference in differences (Google’s chief economist Hal Varian discusses these methods from 18:46-31:52 here).

Angrist’s PhD thesis examined whether serving in the army negatively affected people’s future income. This is trickier to estimate than it may seem. Simply comparing the wages of people who served in the army against those who didn’t would not be evidence of a causal relationship, since people who enter the army may differ in important ways from those who don’t, and these differences may in turn affect their income potential. What is really needed is a source of randomization which takes a group of men and forces some of them into the army.

Just such a randomization tool was provided in the 1970s by the American government, which ran televised draft lotteries (wiki) during the Vietnam War to select young men to be inducted into the army. This process essentially placed otherwise similar men into a treatment group (those who were drafted) and a control group (those who weren’t) by randomly selecting birth days of the year ranging from 1 to 366 (February 29 was included).

By combining the birth records of those born in 1950-53 with later earnings data drawn from a 1% sample of all social security numbers, Angrist found that, among white men, drafted veterans went on to earn 15% less than their peers who had avoided the army.

3. Card & Krueger (1994), Minimum Wages and Employment: A Case Study of the Fast-Food Industry in New Jersey and Pennsylvania, AER.

This well known paper examines the relationship between the minimum wage and employment, an issue on which economists still have a beautifully symmetrical lack of consensus. Card & Kruegers research design was simple. To quote their abstract: “On April 1, 1992, New Jersey's minimum wage rose from $4.25 to $5.05 per hour. To evaluate the impact of the law we surveyed 410 fast-food restaurants in New Jersey and [neighbouring] eastern Pennsylvania before and after the rise. Comparisons of employment growth at stores in New Jersey and Pennsylvania (where the minimum wage was constant) provide simple estimates of the effect of the higher minimum wage”.

The graph (made by me based on the original data) shows the results. Since Pennsylvania did not change its minimum wage, this makes it the control group. If we consider New Jersey and Pennsylvania to be comparable, then we would have expected to see Pennsylvania’s downwarding sloping trend replicated in New Jersey if New Jersey had not increased its minimum wage. But we don’t see this; in fact there is a slight increase in the average number of employees in New Jersey restaurants. The authors interpret this as showing that the rise in the minimum wage did not reduce employment. Further studies on this topic are reviewed here by Schmitt (2013).

4. Almond (2006), Is the 1918 Influenza Pandemic Over? Long Term Effects of In Utero Influenza Exposure in the Post-1940 U.S. Population, Journal of Political Economy

The 1918 flu pandemic (wiki) began towards the end of World War 1 and ended up killing more people around the world (an estimated 50-100 million) than the entirety of that four year conflict. The economist Douglas Almond examined the arrival of the pandemic in America to test the Barker hypothesis of fetal development (discussed here by Almond &Currie, 2011), which posits that conditions in the prenatal environment may affect a person’s future health for decades afterwards. The flu arrived unexpectedly in America in October 1918 and mostly subsided by January 1919, during which time one-third of women of childbearing age contracted it. Children born several months apart during this period may therefore have experienced very different prenatal environments.

Using U.S. Census data collected in 1960, 1970 and 1980 which identifies the individuals’ place and time of birth, Almond found that individuals who were in utero during the pandemic had, on average, increased rates of physical disability (pictured below), reduced educational attainment, and lower income and socioeconomic status.

5. Almond, Mazumder, & van Ewijk (2014), In Utero Ramadan Exposure and Children's Academic Performance, Economic Journal.

During Ramadan, the ninth month of the Islamic calender, many adult Muslims fast during the day as a religious observance. There are also high rates of compliance among pregnant women despite the fact that some Muslim scholars argue that fasting is not obligatory for this group. The authors of this study used this natural occurance to examine whether children in utero during Ramadan differed on cognitive performance at age 7 from the children of Muslim mothers whose pregnancy did not overlap with Ramadan.

The authors examined data on over 200,000 English children of Pakistani and Bangladeshi background who took the Key Stage 1 program (wiki) between 1998 and 2007. Although they do not directly observe religiosity in their data, 92% of all British Pakistani/Bangladeshis report being Muslim. The authors also do not have data on whether the mothers actually fasted during pregnancy, so instead they used children's dates of births to examine whether Ramadan and pregnancy had overlapped.

They found that 7-year old Muslims whose pregnancies overlapped with Ramadan performed worse in maths, reading and writing than comparable Muslim children born to mothers where Ramadan fell soon after birth. These effects were largest (0.08 standard deviations) when Ramadan overlapped with the first three months of gestation, possibly reflecting the importance of biological processes during that time, and/or greater levels of fasting compliance among pregnant Muslim women who may have been unaware that they were pregnant.

6. The Dutch Famine Study

The Dutch “hunger winter” (wiki) took place towards the end of World War 2 in the German-occupied Netherlands from late 1944 until the liberation of the area by the Allies in May 1945. During this time official food rations dropped to as low as 500 calories per day and around 20,000 people died of starvation. This experience of a modern advanced country experiencing a short, sharp famine is quite unusual, and in this case the West of the country was strongly affected but not the North or South. This has allowed many researchers to examine the effects of the famine on population health by comparing the outcomes of people from different regions. Of particular interest are the outcomes of the children who were in utero at the time of the famine, as they may have suffered developmental impairments with potentially lifelong repercussions.

The famine affected the area containing Amsterdam, Rotterdam and Utrecht to the west of the black borders. Source: p3 of Lumey & van Poppel (2013).

Lumey & van Poppel (2013) provide an overview of the many studies which have been examined the famine’s long-run effects by linking birth records with later military examination records, psychiatric hospital records, population surveys, medical examinations and DNA analysis. They find that exposure to the famine in the prenatal environment predicted higher adult rates of obesity, diabetes and schizophrenia. These same patterns have also been identified in studies examining the long-run effects of the Chinese famine which occured during the Great Leap Forward in 1959-61 (wiki), although the mechaisms causing these specific outcomes are not precisely understood.

7. Card (1990), The Impact of the Mariel Boatlift on the Miami Labor Market, Industrial and Labor Relations Review.

While a majority of economists agree that the average American citizen would be better off if more low-skilled and high-skilled immigrants were allowed to enter the US each year, there is considerably more opposition to immigration among the general public in Britain, America and continental Europe. One frequently raised concern, aside from issues of social cohesion, is that the arrival of immigrants might negatively disrupt the economy and make it more difficult for local workers to secure good jobs.

David Card examined this issue by looking at the labor market effects of the “Mariel boatlift” (wiki), the name given to the arrival of 125,000 Cubans into Florida during April - September 1980 (this is the impetus for Tony Montana’s move to America at the start of Scarface). This sudden arrival of relatively unskilled young men increased the size of the Miami labor force by 7%. By comparing Miami with other cities which were not affected by the Mariel boatlift, Card showed that the arrival of the Mariel workers did not notably affect the wages and employment rates of the existing unskilled workers living in Miami.

Two similar studies examined the labor market effects of the arrival of 900,000 French Algerians into France when Algeria declared independence in 1962 (Hunt, 1992), and 600,000 Russians into Israel over 1989-95 as the Soviet Union collapsed (Friedberg, 2001). Like the Mariel study, neither of these studies found harmful effects of immigration on local labor markets. The results of these studies, and an enormous amount of comments citing objections to the generalizability of their conclusions, are further discussed in this New York Times article (2015).

8. Rich et al (2015), Differences in Birth Weight Associated with the 2008 Beijing Olympic Air Pollution Reduction: Results from a Natural Experiment, Environmental Health Perspectives.

In preparation for the Beijing Olympics in 2008, the Chinese government introduced several measures to improve the city’s notoriously poor air quality, including restrictions on vehicle use, the temporary closure of factories and construction projects, and seeding clouds to induce rain. These changes effectively reduced air pollution during the Olympic period, after which things returned to normal, therefore creating a sharply defined window of relatively good air quality. The image below shows an example of the possible variation in Beijing's air quality by comparing two days in August 2005 (source).

The authors used this sudden reduction in pollution to compare its effects on the weight of 83,672 babies born in four districts of Beijing during the Olympics (August 8 – September 24) against the weight of babies born during the same period in 2007 and 2009. They found that the babies born in 2008 were 23 grams (0.05 pounds) heavier on average, suggesting that air pollution does interfere with fetal development.

9. Angrist & Lavy (1999), Using Maimonides' Rule to Estimate the Effect of Class Size on Scholastic Achievement, QJE.

Angrist & Levy exploited an 800 year old rule governing the size of Israeli classrooms to examine whether smaller classes improved student performance. The rule is derived from the teachings of the 12th century scholar Maimonides (wiki) who said: "Twenty-five children may be put in charge of one teacher. If the number in the class exceeds twenty-five but is not more than forty, he should have an assistant to help with the instruction. If there are more than forty, two teachers must be appointed” (source). A strict application of this rule would mean that if 80 students were enrolled in a school, they should placed in two classes of 40 pupils each. If 81 students were enrolled, they should be placed in three classes of 27 pupils each.

This rule generated sharp discontinuities in class sizes in the schools examined by the authors, as shown in the graph. They found that reductions in class size improved maths and reading scores for fifth graders, improved reading scores for fourth graders and had no effect on third graders.

10. (i) Kearney & Levine (2014), Media Influences on Social Outcomes: The Impact of MTV’s 16 and Pregnant on Teen Childbearing, NBER Working Paper. (ii) Kearney & Levine (2015), Early Childhood Education by MOOC: Lessons from Sesame Street, NBER Working Paper.

The economists Phillip Levine and Melissa Kearney recently published two similar papers examining the effect of television programs on the behaviour of young people.

Their first paper examined whether teenage viewers who watched the MTV's program “16 and Pregnant” (YouTube) were less likely to have children themselves. By drawing on data from Google Trends and Twitter, the authors found that searches and tweets about birth control and abortion spiked when the show was on and in places where it was popular. They estimated that the introduction of the show, and similar program “Teen Mom”, led to a 5.7 percent reduction in teen births that otherwise would have been conceived between June 2009, when the show began, and the end of 2010. This would explain around one-third of the total decline in teen births over that period.

Their second study examined whether the introduction of the popular children’s show “Sesame Street” in 1969 improved children’s school performance through its explicit emphasis on educational content. One-third of young American children watched the program in the early 1970s, but despite its popularity, another one-third of America was unable to watch it even if they wanted to. At the time, TV broadcasts were received via either VHF or the weaker UHF signal, and many households televisions were unable to receive the latter (54% of households had TVs which could receive UHF in 1970). This means that whether the local public station broadcast “Sesame Street” on VHF or UHF would have a large effect on the number of people who were able to receive it. Fortunately for the researchers, this is where the randomization element came in: because of federal licensing rules, Sesame Street happened to air on VHF in cities such as New York and Boston, but on UHF in comparable places like Los Angeles and Washington.

By comparing academic outcomes across places with different levels of “Sesame Street” reception, the authors found that after the program was introduced, children living in places where its broadcast could be more easily received were 14 percent less likely to be behind in school, with stronger effects for boys, African Americans and children growing up in disadvantaged areas.

These WSJ (2008) and Economist (2009) articles discuss similar studies examining the television’s effect on children’s cognitive ability in America (QJE, 2008), attitudes towards domestic violence in India (QJE, 2008), and fertility decisions in Brazil (AEJ, 2012)

11. Costello et al. (2003), Relationships Between Poverty and Psychopathology A Natural Experiment, JAMA.

This article examined the relationship between poverty and childhood mental health. Specifically, the authors used data from the North Carolina based Great Smoky Mountains Study to test whether the environmental stresses associated with poverty caused poor mental health in children. Over 1993-2000 this study conducted annual mental health assessments of 1420 rural children, 350 of whom were American Indian. Over the 8 years of the study, children living below the federal poverty line were 60% more likely to have a psychiatric disorder.

In 1996, a casino opened on the local Indian reservation and began paying the American Indians in the study a percentage of the profits every 6 months. This payment increased each year and had reached $6000 by 2001. Children’s earnings were paid into a trust fund until they were aged 18. This cash injection reduced the poverty rate of Indian families while non-Indian families remained relatively unaffected (see graph).

By tracking the Indian families who moved out of poverty as a result of this income boost, the authors found that these children had a large decrease in behavioural problems, although there was no effect on the rate of depression or anxiety problems. Improved parental supervision explained over three quarters of the effect of decreased poverty on the number of child psychiatric symptoms in the years after the casino opened, and additional analysis suggested that this was due to a reduction in time constraints within the family; e.g. in the families that moved out of poverty, the number of single-parent households decreased and the number of households with 2 working parents increased.

12. Beckett et al. (2006), Do the Effects of Early Severe Deprivation on Cognition Persist Into Early Adolescence? Findings From the English and Romanian Adoptees Study, Child Development.

The opening of Romania to foreigners after the fall of the Ceausescu government in 1989 revealed to the world the horrific living conditions of the thousands of children living in state-run orphanages (described in these BBC and Guardian articles). Physical and sexual abuse was common, and many of the children did not receive sufficient care or adequate nutrition. As a result, children raised in these orphanages typically had severely impaired social and cognitive development and high sensitivity to stress.

Beckett et al. examined the effect of this early life deprivation on the childrens' IQ scores at age 11. They compared two groups: 128 Romanian children who had been adopted by UK families and 50 UK-born children who were adopted before the age of 6 months. Because many of the Romanian children had been adopted at different ages, the authors also examined whether children who experienced more exposure to the orphanage environment had been more cognitively damaged by it.

Their results are shown in the graph, made by me based on the original data (vertical lines indicate +/- 1 standard deviation). Romanian children who been adopted before the age of 6 months had similar IQ scores to the UK adoptees, whereas children adopted after this age performed much worse scores. These strong effects existed despite the fact that the adopted children had spent over 7 years in their adoptive homes.

13. Bouchard et al. (1990), Sources of Human Psychological Differences: The Minnesota Study of Twins Reared Apart, Science.

In his TED talk on ‘The Blank Slate’, the psychologist Steven Pinker argues that “most studies of parenting ...[are] useless because they don't control for heritability. They measure some correlation between what the parents do, how the children turn out and assume a causal relation: that the parenting shaped the child... And very few of them control for the possibility that parents pass on genes.. Until the studies are redone with adoptive children, who provide an environment but not genes to their kids, we have no way of knowing whether these conclusions are valid.”

The world-reknowned Minnesota Study of Twins Reared Apart addresses Pinker’s concern about separating the effects of genetics and the environment. MISTRA was started by the psychologist Tom Bouchard in 1979 and ran until 2000, during which time it collected data on 137 twins who had been separated by age four (on average they were separated at 7 months) and who had spent their formative years apart. Some of the similarities of the separated twin pairs, discussed in this book review of “Born Together – Reared Apart”, are truly incredible:

“[The ‘Jim twins’], separated at 4 weeks of age and not meeting until after 39 years of separation, had both married women named Linda first and then Betty. One had a son named James Allen while the other named his son James Alan. Both had dogs named Toy and both smoked Salem cigarettes, both had carpentry shops in their garage and both had math as their favorite subject and spelling as their worst.

Another interesting set of MZA [identical] twins was Oskar and Jack. One was raised as a Catholic in a Nazi home in Germany while the other was raised as a Jew in Trinidad. They met briefly in Germany at the age of 21, but only at the age of 47 did they meet for any considerable length of time ... Jack was raised by his father and his grandmother raised Oskar. When the twins appeared at the Minneapolis International Airport they were both wearing blue shirts with epaulettes and wire-rimmed glasses. They wore similarly shaped moustaches. Testing of their personality by the MMPI, the Minnesota Multiphasic Personality Inventory, ‘revealed personality profiles that matched almost perfectly’”.

Bouchard et al. used this data to examine the relative contributions of genetics and environmental factors to IQ by comparing the IQ scores of monozygotic (identical) twins who had been raised apart. The results are shown in the graph. The y-axis shows the difference in IQ score between the two twins and each circle on the plane represents a pair of twins. The x-axis shows the duration of time the previously estranged twins had gotten to know each other. This is provided to show that twins who had more contact with each other did not grow to resemble one another in terms of IQ. There are three horizontal lines on the plane. The top-most line (“two random individuals”) meets the y-axis at 18 points; this is the average difference in IQ score between two randomly selected individuals who are not related. This is provided as a baseline. The bottom line (“Two testings of the same individual”) is the average difference in IQ scores when the same person takes the same test twice, about 5 points. This is another baseline. The middle line (“Mean MZA difference”) shows the average difference in IQ scores between the pairs of monozygotic twins – around 6 points. This means that if you take the IQ of two twins, their scores are much more likely to be as close to taking the scores of the same person twice than to taking the scores of two random strangers.

Since the twins were genetically identical but raised apart, the differences between them must be due to the environment. Given that these differences are minute, the authors interpret this graph as confirming a strong genetic component to IQ. Relatedly, a recent meta-analysis of 2,748 publications drawn from 50 years of twin studies by Polderman et al. (2015) has concluded that the average heritability across all human traits is 49%, further suggesting a large genetic component to individual differences in IQ, personality and so on.

14. Acemoglu, Johnson, & Robinson (2001), The Colonial Origins of Comparative Development: An Empirical Investigation, AER.

Does a country need good institutions (e.g. market economy, strong property rights, independent judiciary, checks and balances on government power) in order to promote economic growth, or does economic growth lead to strong institutions? This is not just an idle historical question – a strong answer either way would have implications today for how institutions like the IMF work with developing countries.

Acemoglu et al. examined this in their very famous 2001 paper (already cited over 8,000 times on Google Scholar) by comparing the outcomes of colonized countries. Their argument runs as follows:

1. There was considerable variation in colonization policies among the nations colonized by Europeans. This variation led to different kinds of institutions being created in the colonized states. At one end of the spectrum were purely extractive states such as the Belgian Congo which did not promote private property or limit government expropriation, at the other end were states such as Australia and the United States which received many European immigrants and tried to replicate European institutions.

2. The different colonization strategies were influenced by the feasability of settlements. In areas with high rates of disease and mortality among the European settlers, they were more likely to set up extractive institutions since they could not settle there in the long-term.

3. The kind of institutions set up by the European colonists persisted for many years afterwards, even after the countries became independent.

The authors created a measure of #2 by drawing on historical data on the mortality rates of sailors, bishops and soldiers in the colonies during the 1600-1800s. The figure shows the log of these settler mortality rates against the log of GDP per capita in 1995 for some of the 75 countries the authors examined. The trend is clear; colonies in which Europeans died at a higher rate (e.g. Gambia, Mali, Nigeria) are much poorer than colonies which provided safer environments (e.g. Australia, Canada, Fiji). This trend also holds when excluding Africa and the relative outliers of Australia, Canada, New Zealand and the US. The authors interpret this as evidence that good institutions causally improve economic growth.

A critique of this paper by the economist David Albouy, and Acemoglu et al's response to it, is available here.

15. Acconcia, Corsetti & Simonelli (2014), Mafia and Public Spending: Evidence on the Fiscal Multiplier from a Quasi-experiment, AER.

The effect of government spending on the economy has been a contentious issue since the financial crisis began to affect the world economy in 2008. While most American economists agree that the benefits of America's 2009 stimulus spending bill exceeded its costs, many European countries have enacted cuts in public spending in response to the recession (BBC, 2012). One important factor in this debate is the size of the multiplier (wiki); if the government spends an extra $100 and GDP increases by $150, this would indicate a multiplier of 1.5. If the government cuts spending by $100 and GDP decreases by $200, this would be a multiplier of 2. Although the size of the multiplier has important implications for government spending decisions, it can be very difficult to estimate. For example, to estimate the multiplier from the 2009 stimulus bill in America, ideally we would like to compare the current US economy against an alternative America which never passed the stimulus, an obvious impossibility.

A 1991 Italian law issued to combat political corruption and Mafia infiltration of city councils provided an opportunity to isolate the multiplier effect of government spending on the local economy. The law replaced local city councillors with external commissioners, who typically enacted large cuts in public spending because they suspected that the Mafia had been using this to divert money to themselves through things like road maintenance. By examining these sudden episodes of fiscal contraction in Italian provinces over 1990-99, the authors estimated the multiplier effect of spending cuts to be 1.5 – meaning that for every $1 in spending cuts, there was a corresponding drop of $1.5 in local economic activity.

16. Lacetera, Macis & Slonim (2012), Will There Be Blood? Incentives and Displacement Effects in Pro-Social Behaviour, American Economic Journal.

Previously featured on the blog as one of the 15 Best Behavioural Science Graphs 2010-13, in this paper the authors investigated whether incentivizing people to donate blood is detrimental to the quantity and safety of the blood supply. The authors used data from 500,000 donations in 14,000 American Red Cross (ARC) blood drives taking place over 2 years. Of these blood drives, 37% offered some kind of incentive for blood donations (such as a blanket, t-shirt, coupon, etc.). Because the ARC has limited funds it tries to allocate these incentives across host centres in a non-systematic way in order to treat all its hosts fairly.

The graph shows the results. On average, offering incentives led to 15-20% more donors showing up at a drive and more valuable incentives elicited more donations. Offering incentives did not increase the proportion of blood being deferred due to ineligibility (described by the purple line) - so paying people for their blood did not lead to adverse selection where people with unsafe blood were more likely to donate.

17. Almond, Edlund & Palme (2009), Chernobyl's Subclinical Legacy: Prenatal Exposure to Radioactive Fallout and School Outcomes in Sweden, QJE.

The third paper on this list to feature Douglas Almond as an author, this study examined the effect of radioactive fallout from the Chernobyl accident in April 1986 on the cognitive ability of Swedish children.

During the ten days it took to control the fire at the Chernobyl nuclear plant, large quantities of radioactive material were released across Europe. In order to measure the damage, the Swedish Geological Company conducted aerial measurements of ground deposition of Caesium-137 (wiki) fallout across Sweden. The graph shows the results, where darker colours indicate greater levels of radioactive material. There was considerable geographic variation in deposition due mainly to differences in rainfall at the time of the accident. There are two particulary extreme regions – R0, which was essentially unaffected, and R3 which was strongly affected. To give a sense of the magnitude, regions with kBq/m2 above 37 were classified as “contaminated” and ground deposition in the worst affected areas in Sweden equalled those found just outside Chernobyl’s 30 km radius exclusion zone.

Evidence from the nuclear weapons used on Hiroshima and Nagasaki in 1945 suggests that among pregnant women, the children of women who were 8-25 weeks pregnant at the time of the bombings suffered the most severe developmental impairments, with large reductions in IQ and school performance. Since the radioactive cloud affected Sweden from April 27 to May 10, the authors considered those born between August and December of 1986 (who were therefore 8-25 weeks in utero at the time of the accident) as the cohort most likely to be negatively affected by the radiation.

By matching the geographic radiation data to the school records of 551,630 Swedish children born in 1983-88, the authors found that while the Autumn 1986 cohort did not appear to suffer health damage, they performed substantially worse in the final year of middle school and were less likely to graduate high school. The graph below shows the differences in academic performance between children born in R3 (the most affected region) and R0 (the least affected) by month of birth. There is a striking dip in this difference for the children who were 8-25 weeks in utero at the time of the accident, while those born outside this window were not strongly affected.

Projecting into the future, the authors estimated that Chernobyl will cause a 3-percent reduction in annual earnings for the most exposed Swedes.

18. Jensen (2007), The Digital Provide: Information (Technology), Market Performance, and Welfare in the South Indian Fisheries Sector, QJE.

The “law of one price” (wiki) in economics says that a product should sell for the same price in all locations. This seems logical enough; if Aldi sells milk for £1 and neighbouring Lidl tries to sell the same product for £3, milk loving customers will flock to Aldi until Lidl drops its price. In the real world however, this example might not run so smoothly if customers don’t have access to information on the different milk prices. Despite the centrality of this theory in economics, there are surprisingly few empiricial studies examining how improvements in information affect markets.

The economist Robert Jensen tested this theory by examining the effect of the introduction of mobile phones in the Indian state of Kerala (wiki), which employs over one million people in the fisheries sector. A significant problem for fishermen in Kerala is that while they are at sea, they are unable to observe fish prices at the many markets spread along the coast, and due to time and cost constraints they can only visit one market per day. Since some markets end up having too much buyers and some have too many sellers, there tends to be large amounts of variation in the money the fishermen get for their fish.

In 1997, mobile phone service began to be introduced throughout Kerala. The figure below shows the staggered roll-out of service across the three regions examined by the author, each of which contained five fish markets. Since most of the major cities are coastal, the phone towers were placed close enough to the shore so that service was available 20-25km out to sea, where most fishing was done. By 2001, over 60 percent of fishing boats and most wholesale and retail traders were using mobile phones to coordinate sales.

To get a measure of fish prices, the author conducted a weekly survey of 300 sardine fishing boats from September 1996 to May 2001. The results are shown below. The y-axis shows the average 7:30-8:00AM market prices for sardines in 2001 Rupees. As soon as mobile phones are adopted in each region, sardine prices immediately stabilize, providing strong support for the law of one price.

19. Kirk (2015), A natural experiment of the consequences of concentrating former prisoners in the same neighborhoods, PNAS.

Hurricane Katrina was the costliest natural disaster in American history, causing over $100 billion in property damage and displacing 800,000 people when it struck the South East coast in 2005 (wiki). Its impact on New Orleans (wiki) in Louisiana was particularly devastating; the population fell by almost 50% in one year (from 455,000 in 2005 to 209,000 in 2006), employment rates dropped precipitiously and entire neighbourhoods were destroyed. As you might expect, Katrina has generated hundreds of studies across public health, sociology, economics and other disciplines: Google Scholar counts 869 results for “Natural experiment” and “Hurricane Katrina” and some academics in Louisiana are even getting sick of fielding research requests.

The sociologist David Kirk examined how Katrina affected the chances of ex-prisoners ending up back in jail. More than 600,000 prisoners are released from incarceration every year in the United States, many of whom end up living in urban areas, often clustered within a few particular neighbourhoods. This means that many ex-prisoners end up living near one another, potentially leading to higher rates of recidivism due to old social ties, and institutional and structural barriers such as parole policies and housing market dynamics. One consequence of Katrina was that many released prisoners from New Orleans did not have homes to return to, since they had been destroyed by the flooding. Instead many moved to other communities within Louisiana such as Baton Rouge and Lafayette.

Kirk tracked over 5000 parolees, around half of whom were released in the post-Katrina months of September - December 2005, and half during the same period in 2006. By comparing reincarceration rates with the concentration of new parolees for 493 Louisiana communities, Kirk found that for every new additional parolee in a neighbourhood, the reincarceration rate for other new parolees in that neighbourhood went up by around 2.5 percentage points (11%).

economics, psychology, policy

Pages

Tuesday, June 30, 2015

List of 19 Natural Experiments

No comments:

Post a Comment