Show Summary Details

Page of

Printed from Oxford Research Encyclopedias, Politics. Under the terms of the licence agreement, an individual user may print out a single article for personal use (for details see Privacy Policy and Legal Notice).

Subscriber: null; date: 05 December 2020

Evaluating Success and Failure in Crisis Managementlocked

  • Allan McConnellAllan McConnellDepartment of Political Science, The University of Sydney


Crises and disasters come in many shapes and sizes. They range from global pandemics and global financial crises to tsunamis, hurricanes, volcanic ash clouds, bushfires, terrorist attacks, critical infrastructure failures and food contamination episodes. Threats may be locally isolated such as an explosion at a local fireworks factory, or they may cascade across multiple countries and sectors, such as pandemics. No country is immune from the challenge of managing extraordinary threats, and doing so out of their comfort zone of routine policy making. The crisis management challenge involves managing threats ‘on the ground’, as well as the political fallout and societal fears.

Populist and journalistic commentary frequently labels crisis management initiatives as having either succeeded or failed. The realities are much more complex. Evaluators confront numerous methodological challenges. These include the careful consideration of multiple and often competing outcomes, differing perceptions, issues of success for whom, and gray areas stemming from shortfalls and lack of evidence, as well as variations over time. Despite such complexity, some key themes appear continually across evaluations, from internal reviews to royal commissions and accident inquiries. These pertain to the ways in which evaluations can be shaped heavily or lightly by political agendas, the degree to which evaluating organizations are able to open up, the degree to which gray areas and shortfalls are stumbling blocks in producing findings, and the challenge of producing coherent investigative narratives when many storylines are possible. Ultimately, evaluating crisis initiatives is “political” in nature because it seeks to provide authoritative evaluations that reconcile multiple views, from experts and lawyers to victims and their families.


All societies face “crisis” situations, from terrorist attacks, tsunamis, and hurricanes to economic recessions, riots, food contamination episodes, and pandemics. During the acute phase and in the aftermath, a vast range of actors and institutions (citizens, media, political parties, formal inquiries, think tanks, academics) routinely label the crisis response as a “success” or “failure.” Some episodes widely considered successful are the 2018 rescue of 13 young boys trapped in a Thai cave and Australia’s response to the 2002 and 2005 Bali bombings. By contrast, cases such as the government of Iceland’s response to the 2008 banking crisis, and the US response to Hurricane Maria which devasted Puerto Rico in 2017 have been widely condemned as failures.

Multiple success and failure narratives have practical consequences. A crisis response can make or break the reputation of policy and political elites (Boin, McConnell, & ‘t Hart, 2008). US President George W. Bush’s approval ratings soared to historic levels after 9/11, and German Chancellor Angela Merkel became increasingly popular in Germany due to her tough response to the Euro crisis (Kornelius, 2014). By contrast, some leaders can fall down on the “wrong side” of history and find their careers highly damaged or effectively ended by perceptions of crisis mismanagement, such as Spanish Minister José María Aznar after the 2004 Madrid bombings, and Japanese Prime Minister Naoto Kan who resigned several months after the 2011 Fukushima nuclear disaster. Institutions can be subject to similar reputational trajectories after crisis. Reputation and even funding can be boosted in the aftermath of crisis, or by contrast the trajectory can be very bleak. The United Kingdom’s Ministry for Agriculture, Fisheries and Food was abolished after its poor performance in managing the UK’s 2001 foot-and-mouth crisis. The trajectory of policy sectors in the aftermath of crisis can also vary, although whether policy initiatives have succeeded or failed depends in part on whether we support the underlying values and how we conceive of the best way to put them into practice. In the wake of crisis, some policies and sectors can go through substantial transformation. For example, state-level flood mitigation policies in the United States have been transformed in the wake of focusing events (O’Donovan, 2017), as has animal health security after the 2001 foot-and-mouth crisis in the United Kingdom (Connolly, 2014).

The precise reasons for crisis as a catalyst for revival and rejuvenation are always contingent on individual circumstances, although at the most basic of levels we can say that it involves a process of “never again” learning (typically via the findings of a post-crisis inquiry or commission) to help reduce the risks of similar episodes in the future. The opposite trajectory can also be manifest. After the 2011 London riots, for example, little changed, with initial momentum for reform dissipating and being replaced by a new agenda of budget cuts and seeking to curb youth radicalization and its links to terrorism (Lodge, 2019).

Given the significance of crisis evaluation as an issue with considerable political and policy implications, the focus of this article is on unpacking key issues and processes around the evaluation of crisis management initiatives. It is divided into two main sections. First, we look at a series of methodological challenges in ascertaining whether crisis management initiatives have succeeded or failed. Second, we turn our attention to how evaluation operates in practice, focusing particularly on a series of recurring themes around political agenda setting, the scope of evaluations, how ambiguities/lack of evidence may be addressed, and which narrative to construct that reconciles all the complexities around the trajectory of crisis management episodes.

Evaluation Challenges for Crisis Management

There is no critical mass of academic literature that deals directly with evaluating crisis management initiatives. It is scattered across research and analyses of crisis inquiries and royal commissions (e.g., Boin et al., 2008; Chapman, 1973; Elliott & McGuinness, 2002; Kitts, 2006; Prasser & Tracey, 2014; Stanley & Manthorpe, 2004; Stark, 2018), as well as numerous ad hoc examinations of aspects of crises and their outcomes (e.g., Birkland, 2007; Boin & Fischbacher-Smith, 2011; Boudes & Laroche, 2009; Brown, 2004; Carayannopoulos, 2018; O’Donovan, 2017; Prasser, 2006; Snider, 2004). The first work to focus specifically on evaluation and crisis management was by McConnell (2011), drawing on issues of public policy evaluation/success/failure and adapting them for the domain of “crisis.” What follows in this article is an outline and original development of some of the ideas therein, while drawing on numerous cases and insights that are dispersed across a variety of research and analyses.

Figure 1. The crisis management cycle.

Crisis evaluation is a stage in the idealized crisis management cycle (see Figure 1), which in itself is a variation on the quasi-rationalist policy cycle, where societies reflect on evidence-based problems and adapt and learn in the “public interest” (see also Pursiainen, 2018). Evaluation is crucial, therefore, because it is the bridge between an undesirable past and a potentially brighter future. Put simply, crisis evaluation seeks to ascertain if, and in what way, a particular crisis management initiative and its associated outcomes were successful and/or unsuccessful. While it is certainly possible to have evaluations that are meaningful in policy-relevant ways (addressing issues of how to better mitigate risks and be better prepared for future contingencies), we should be alert to the multiple and genuine challenges of evaluating crisis management (see McConnell, 2011, for an overview). A number of challenges can be identified.

Where Do We Set the Boundaries of Evaluation?

Many crises, in terms of potential causes, responses, and recovery, involve multiple levels of government such as municipal, local, state, regional, and national. They may also involve the private sector (such as energy suppliers and transport providers) as well as non-governmental organizations and citizen-led groups (both informal and informal). Placing certain actors, institutions, and phenomena outside the scope of evaluation limits (or even blocks) the possibility that they are included in the examination of issues that produces a narrative of successful or unsuccessful crisis management (or somewhere in between). This means that they are insulated from questions around their actions and inactions, responsibility, blame, and influence. When an inquiry was set up to investigate the role of the privatized Australian Wheat Board (AWB) in making corrupt payments to Iraq in contravention of U.N. sanctions, the terms of reference were focused only on the role of the AWB, and not on issues of ministerial oversight or whether the rationale for the original privatization (and light-touch regulation) had been a good one (McConnell, Gauja, & Botterill, 2008). The converse is that what is included within the evaluation boundaries makes it possible to scrutinize and assess certain issues, actors, institutions, and events.

The real challenge in terms of evaluation, therefore, is where to draw the line. Excluding some potentially relevant factors can be beneficial, but it can also bring risks. In the AWB case, shining the spotlight only on the AWB insulated ministerial and broader policy trajectories from scrutiny, but it also incurred risks that the evaluation was partial and politically driven. There is no scientific way of setting demarcation lines. Wildavsky (1987), with his insight into policy analysis, applies perfectly here. Setting the boundaries of evaluation involves art, craft, and judgment.

What Success Benchmarks Do We Use?

It may be tempting to see the benchmark for crisis management success as being a “resolution to the crisis.” In reality, unraveling what success looks like is a tough issue because there are multiple potential benchmarks for success. Adapting McConnell (2011, p. 65), these include:

Stated objectives of crisis managers;

Benefit to individuals/groups/localities under threat;

Level and speed of improvement;

Adherence to industry standards (e.g., risk management standards, crisis management protocols);

Adherence to appropriate laws;

Adherence to contingency plans and/or existing rules and procedures;

Efficient coordination and communication;

Comparison with the crisis experience of another jurisdiction;

Comparison historically with a similar crisis experience in the same jurisdiction;

Level of expert/political/public support for the initiatives;

Benefits outweighing costs;

Degree of innovation adopted; and

Preservation or enhancement of moral/ethical principles.

Many crises can be evaluated against most if not all of these, but the evaluation would be more inclined to produce a success or failure judgment, depending on which benchmarks are selected. A few examples help illustrate.

First, a crisis management agency may stubbornly and with good intention stick to its crisis management plans and protocols, but doing so did not (in hindsight) provide enough flexibility to manage events as they unfolded (Eriksson & McConnell, 2011). During the 2014, Sewol ferry disaster off the shores of South Korea, the Korean Coast Guard (KCG) adhered vigorously to its preset rules—in effect, being successful in terms of hierarchical legal and political accountability. But it did so at the expense of allowing discretion (a failure of professional accountability) to innovate in the face of the specific contingencies faced (Jin & Song, 2017). Is the crisis response a success because the KCG did exactly as it was set up to do, or a failure because 295 died and 8 were missing? Most of us would be inclined to the latter judgment, but this does illustrate the fact that evaluation requires critical judgment in dismissing some successes when weighed against failures.

Second, crisis managers can be relatively successful when compared to those in another jurisdiction handling the same or similar issues, but less successful when it comes to another benchmark. For example, transboundary threats such as pandemics and global financial crises present nation states with common challenges. Yet, just because states respond differently, and some may be more successful than others, does not mean that they succeed against all other benchmarks. Bell and Hindmoor (2015), for example, address Canada and Australia’s relatively successful handling of the 2008–2009 global financial crisis in comparison to their UK and US counterparts, attributing the differing outcomes partly to different regulatory approaches in these countries. However, in some respects the response was less successful in the protection of some threats. In Australia there was a large decline in equity prices, reducing household wealth by almost 10% (Australian Bureau of Statistics, 2009–2010). Generally, those promoting a success narrative tend to focus on the benchmark that best fits their values. For example, crisis management coordination of the response to the 2013 floods affecting Germany was widely praised for an improvement when compared to previous floods in 2002 (Jann, Jantz, Kühne, & Schulze-Gabrechten, 2018).

Third, a response may be successful from the point of view of an industry group that supported a particular response, but it may be more of a failure from the perspective of other groups under threat. The 2001 foot-and-mouth crisis in the United Kingdom produced recurring tensions between the National Farmers Union (supportive of no vaccination, cattle culls, and exclusion zones) and the English Tourism council, and the tourist industry more generally (supportive of sending out signals that Britain was a safe place to visit). The Blair government persistently struggled to balance these competing visions of what a successful response should look like (McConnell & Stark, 2002; Taylor, 2003).

Fourth, different organizations have different roles in crisis, and so we should not expect them all to be evaluated in the same way. Boin and ‘t Hart (2010, p. 364) citing Dynes, point in the field of emergency management to four types of organizations. Some are more established and embedded in the disaster response than others (such as police and ambulance services), while others are emerging and not part of the regular response (such as a bushfire recovery authority and a disaster victims’ organization). The implication is that they perform different roles and should be evaluated differently. One size fits all benchmarks for evaluation are not appropriate (see Table 1).

Table 1. Types of Organizations in Disaster Response Processes

Tasks Structure




Type 1: Established, e.g., police, fire, ambulance services

Type 2: Extending, e.g., housing, family and social services, tax, schools


Type 3: Expanding, e.g., Red Cross, Salvation Army

Type 4: Emerging, e.g., Bushfire Recovery Authority, disaster victims’ organizations

Source: Boin and ‘t Hart (2010, p. 364), originally from Dynes (1970).

How Do We Weigh Different Outcomes?

How we weigh outcomes overlaps with the identification of success benchmarks, but it is sufficiently different because it addresses multiple crisis management outcomes, and which ones we consider more important than others (and by how much). On occasion, the assessment of different outcomes can be relatively straightforward. For example, in the case of the Thai cave rescue of 13 boys in 2018, the initial assessment worldwide was that the crisis management initiative had been an unmitigated success. Not only were the boys rescued before the monsoon conditions arrived and flooded the cave, but days after they seemed to be in good shape under the circumstances. In the months that followed, however, news emerged that the rescue mission was highly chaotic, there were rivalries between the UK specialist divers and the Thai authorities and divers, and the boys had been drugged with ketamine (a horse tranquilizer) rather than having being taught to swim—as we had been informed. In this case, most if not all of us would still label the mission a success, despite these numerous pathologies that could have produced disastrous consequences. It is quite easy, therefore, to sweep aside “failures” when one overriding goal is paramount (see McConnell, 2011).

In many other cases, however, weighing a variety of conflicting outcomes is not quite so easy. When Swedish authorities addressed the refugee crisis that spread across parts of Europe in 2015, some regional and local authorities struggled to cope, while others dealt adeptly with the issues, to the point that it may not have been appropriate even to talk of a crisis at all (Myreberg, 2018). Indeed, Myreberg (2018) argues several years after, it is difficult to judge whether the response was a success or failure.

How Much Weight Do We Give to Shortfalls?

Whatever the benchmark(s) we use, success is rarely absolute. There are always circumstances which are less than ideal. Some people can die, there can be a 12-hour delay in restoring power after a power outage, budgets may overspent, and so on. There is no science of crisis management that would allow us to draw a line and say that X shortfall is acceptable, but X + 1 is not. Yet in reality, such judgments are made regularly. In the aforementioned example of the 2018 Thai cave rescue being widely reported and viewed as a success, we should remember that one Thai diver, Saman Kunan, died during the rescue mission.

How Do We Address Lack of Evidence?

Ideally, an evaluation needs evidence. Certainly, the crisis management cycle implies a valuable role for evidence because it amounts to “hard facts” on which judgments can be made. In reality, however, crises do not always produce all the evidence we may like, and evaluators can do little but attempt to “fill in the blanks” and produce estimates or projections.

We know, for example, that 96 people were killed in the 1989 Hillsborough Stadium football disaster, but we do not know the numbers of families and friends traumatized by the disaster, or the specific problems they face on an ongoing basis.

Success for Whom?

This is a critical issue. Inevitably, some people can win while others lose out from a crisis response. Deen (2015) notes the ongoing vulnerability of poor marginalized rural communities in Pakistan, and the continual suffering of this group in times of annual flooding, despite the creation of disaster agencies and substantial improvements in response capacities. Crisis management success is often unevenly distributed (McConnell, 2011). Rights and rewards flow more to some individuals, institutions, and groups than to others. When Hurricane Maria devasted Puerto Rico in 2017, US President Donald Trump praised the efforts of the United States as an “incredible, unsung success” (BBC News, 2018), much to the dismay of the community and other international organizations, recognizing slow and inadequate US reports, with devastation significantly exceeding anything resembling “success.” One report cited in excess of 4000,00 homes destroyed, 500–4,500 deaths, 1 million without power, and roughly half of the population without safe drinking water (Farber, 2018).

What Period of Time to Evaluate?

The standard assumption of a crisis response is that it has three main characteristics: acute response, recovery, and learning. Many episodes do conform to such apparent norms, but it would be wrong to assume that only the acute phase constitutes “crisis management.” As indicated in multiple textbooks on crisis management, effective crisis management begins with prevention, risk mitigation, and planning rather than residing simply in the acute phase (Drennan, McConnell, & Stark, 2015; Handmer & Dovers, 2013). Similarly, crisis may terminate or come to a natural end when the acute stage subsides, but the repercussions can last for many years. Perhaps the highest profile is what ‘t Hart and Boin (2001) call “long shadow crises,” which are a form of “crisis after the crisis,” where the near incomprehensible, mismanaged aspects of a crisis can shape the agenda for many years to come—such as the mishandling of Hurricane Katrina and the ongoing issue of vulnerable black communities (Dyson, 2006). Also high profile are crises where there are ongoing repercussions for groups and communities, weeks, months, and years after the main episode (Drabek, 2010). Not only do we have to identify which time period or periods to consider an issue for evaluators, but we also have to identify how to assess and prioritize the different judgments about success and failure in each.

Different Methodological Approaches and Assumptions

A meta-evaluation issue is one typical of the social sciences more generally, that is, a plurality of approaches and assumptions about the nature of reality (Hay, 2002; Lowndes, Marsh, & Stoker, 2018). When we think about the issue of crisis management success, there are potentially many different ontological and methodological assumptions (Drennan et al., 2015). An objective/rationalist position would assume that success and failure in crisis management are already “out there,” and therefore the role of evaluators is simply to gather, compile, and present relevant data, facts, and figures in a neutral, value-free manner. An interpretivist or constructivist view would consider success and failure to be a matter of interpretation, depending on who is doing the evaluating, their worldview, and the degree to which they are impacted. A critical realistic or pragmatic position is a form of middle ground, accepting the existence of some objective conditions that can identified and verified, while also recognizing that their significance may be subject to multiple interpretations.

In some senses, the broad issue emerging from all of the foregoing is that evaluating crisis management has much in common with evaluation of public policies (Bovens, ‘t Hart, & Kuipers, 2006). Evaluators juggle multiple goals, ambiguities, shortfalls, differing perceptions, and variations in the degree and quality of appropriate evidence. Evaluating crisis management is perhaps even tougher. A particularly important reason is that formal evaluations are usually more in the public spotlight, often with families and support groups (not to mention the media) waiting on the final report, such as the 2013–2017 Royal Commission in Australia in regard to Institutional Responses to Child Sexual Abuse, and the 2017–2019 public inquiry in the United Kingdom into the Grenfell Tower fire in which 72 people died.

Evaluations in Practice: Recurring Themes

There is no single, universally authoritative format for evaluating crisis management episodes (Drennan et al., 2015; Prasser, 2006; Prasser & Tracey, 2014; Stanley & Manthorpe, 2004). Evaluation methods include

“Blue Ribbon” presidential commissions;

Royal commissions;

Executive ad hoc inquiries;

Legislative inquiries;

Internal department/agency inquiries; and

Accident board investigations.

Even similar types of inquiries may be run along very different lines. Stark (2018), in a major examination of four post-crisis reviews on UK floods, SARS in Canada, Australian bushfires, and New Zealand earthquakes, found that they operated in very different ways. Respectively, they were the approaches of public managers using policy tools, accident investigation boards using disaster methodologies, participatory governance focused on gathering the views of all those involved/affected, and questions of legal-judicial issues compliance. Yet such differences can at times mask commonalities in terms of issues of power and authority (see Chapman, 1973, on commissions in the United Kingdom). Here, four main themes can be identified that continually appear when we consider such matters. They are not exhaustive, and they do not apply to every single inquiry, but they do give a sense of important issues that help us understand some of the challenges, strengths, and tensions at the heart of crisis evaluations.

Politics as Agenda Setter: Heavy versus Light Touch

Many evaluations operate in a political context, in the sense that the findings of a crisis investigation can have major repercussions (positive and negative) for governments and political leaders (Boin et al., 2008; Stark, 2018). As outlined previously, investigations and their findings, as well as the ways in which they are received by the media, citizens, and affected groups, can impact on reputations and existing policy and institutional trajectories, as well as the promotion of broader ideological values.

When establishing inquiries and investigations, political leaders are typically highly mindful that the format, terms of reference, ability to call witnesses, funding, and time scale for the investigation are key agenda-shapers of the broad trajectory for its inquiries and deliberations (Drennan et al., 2015; Elliott & McGuinness, 2002; Prasser, 2006; Stanley & Manthorpe, 2004). Thus, when we think of the purpose of inquiries, one is certainly an idealized version of the crisis management cycle, that is, to find out what went right and wrong and to make recommendations to ensure that we are better placed to mitigate the risks of such events in the future, and to address them effectively should a similar event occur again (Birkland, 2007; Stark, 2018). Yet this goal can often compete with other goals (often hidden) such as to insulate political leaders, institutions, and policies from overly detailed scrutiny. There is no obvious link in research to date between the ideological affiliations of governments and political parties and inclinations to heavily steer (or not) crisis evaluations. Patterns tend to be pragmatic ones, rooted in issues such as regime protection, agenda protection, and blame avoidance. Kitts (2006), in his major study of US presidential commissions into episodes such as 9/11 and the Iran-Contra affair, found an overwhelming pattern of seeking to limit damage to the president, rather than a pattern of objective fact-finding and analysis. This hidden trajectory was enacted by means such as screening of commission members, carefully prescribed mandates, and providing limited time and resources. Often, the consequences of this subterranean goal are that others take the blame. Ellis (1994) uses the term “lightning rods” to refer to those officials who intentionally (or by circumstances) attract criticism for failures that otherwise might be directed toward the president (Ellis, 1994).

Blue Ribbon commissions aside, many crisis investigations have been criticized because their scope and “teeth” have been blunted from the start by, in effect, political-led restrictions. Thomas Kean and Lee Hamilton, co-chairs of the 9/11 Commission, wrote subsequently that it was initially “set up to fail” because of its overly broad mandate and lack of access to some vital documents (Kean & Hamilton, 2007).

Political interference, however, is not omnipresent. At times, inquiries can be very expansive with few limits placed on their investigations. The 2009 Victorian Bushfire Royal Commission (VBR) was exceptionally wide-ranging in its remit and highlighted significant weaknesses in the crisis response (not least the controversial “stay or go” policy), but it did ultimately lead to extensive findings that paved the way for substantial reform (Carayannopoulos, 2018; Stark, 2018). Such investigations do lean toward ideal-type crisis evaluations, leaving no stone unturned, and making tough recommendations without fear or favor. Often for policymakers, the political calculus is gauging the risk of tightly steering the inquiry, a light touch, and providing the inquiry with substantial autonomy (Drennan & McConnell, 2007). Tight steering of the evaluation agenda and trajectory may well achieve the goal of limited recommendations and minimal damage to political reputations and policy regimes, but the risk is that the evaluation is criticized as a “fix” or “cover-up”—as happened with the Warren Commission’s investigation into the assassination of John F. Kennedy and the Hutton Inquiry in the United Kingdom into the circumstances surrounding the death of Dr. David Kelly, a scientist in the Ministry of Defense who was at the center of a controversy surrounding the existence of weapons of mass destruction in Iraq. Often, a backlash against a heavily steered inquiry is an acceptable risk for politicians, with critique being the least of the worst possible outcomes (as well as being a phenomenon that is a part of daily political life and the cut and thrust of politics).

Institutional Willingness to Open Up: Closed versus Expansive Mindset

Organizations are sensemakers (Weick, 1995) seeking to interpret and assess the significance of actual and potential events detrimental to the organization. Many organizations/institutions affected by crisis, such as police and health authorities, conduct their own internal investigations. All public organizations are different, not least in the crisis management capacities that are shaped by factors such as degrees of centralization/decentralization, degrees of fragmentation/integration, and coordination/specialization (Lægreid & Rykkja, 2019). The implication is that we should not think, for example, that health authorities, defense departments, intelligence agencies, school authorities, and farming ministries will behave in the same way—aside from localized aspects of leadership and culture. Willingness and capability to seek an unvarnished examination of its crisis response and general capacity to manage threats needs to be balanced against consideration of organizational reputation and core goals such as public service delivery and budgetary management.

In some respects, therefore, it is easy to see why organizations may seek to contain the evaluation and its potential findings, for example, by limiting its scope or allowing it to be led by a chair who is sympathetic to the view that the evaluation should be minimally disruptive to the organization. There are several reasons why an internal evaluation might be heavily contained. It may have the potential to damage organizational reputation and produce findings which if implemented would impede the core business of the organization—either through being too costly to implement and/or being detrimental to core organizational goals.

By contrast, there are many good reasons why organizations will allow more thorough evaluations. There may be a genuine desire to learn from mistakes and to promote greater capacities for the management of future crises (see Stark, 2018, on the significant learning that took place after four major crises in the United Kingdom, Canada, Australia, and New Zealand). The important caveat remains that any particular strategic approach toward internal evaluation is not guaranteed to work. Strategic choices involve weighing risks, and sometimes the judgment of internal leaders can be poor, leading to the inquiry and its findings backfiring at some time in the future.

Shortfalls and Gray Areas: Caution versus Boldness

Investigations, regardless of the degree of freedom to shape their own trajectories, will never confront absolute perfection in the way a crisis is handled. Amid multiple goals of crisis management (ranging from saving lives, restoring power supplies, and commemorating victims), there will always be some that are only partially fulfilled, even to small degrees. Furthermore, there will usually be some goals that were achieved more than others. The response of the Dutch authorities in 2011 to a fire at a chemical plant in the municipality of Moerdijk was successful if we consider that both damage and casualties were limited due to a successful operational-level response, but the transboundary nature of the crisis (toxic smoke crossing both local and geographical boundaries) turned into a public relations disaster as authorities struggled to provide a clear and coordinated message (Boin, Kuipers, & de Jongh, 2018). In addition, availability of evidence is always partial to some degree and so even the most authoritative assessment requires judgment and protection, as was the case with the “definitive” multi-organizational report led by the International Atomic Energy Agency into the long-term impact of radiation exposure as a consequence of the 1986 accident at the Chernobyl nuclear power plant (Chernobyl Forum, 2006). All evaluations need to find a way of navigating the lack of complete, simple outcomes, and whether to exercise caution accordingly or be bold regardless.

Narratives: Which of Many Potential Stories to Tell?

Evaluation reports do not write themselves. They need to be written. Beneath this simple fact is immense complexity, to be distilled into something coherent and readable. There is evidence to be gathered, views to be recognized (from scientific experts to multiple response agencies and affected families), and complex factors to be weaved together into an authoritative narrative or story. There is no single, off-the-shelf narrative that evaluators can use. One reason is the fact that the type of narrative depends to some degree on the format of the investigation. Accident investigation boards, for example, typically focus on narratives around the causes of failure and the extent to which it could be avoided. More expansive evaluations, such as royal commissions into bushfires, will generally focus not only on causes but also the effectiveness of the response. Yet the “story” is not absolutely determined by the specific format. Much also depends on the leadership and the personalities involved. Vaughan (2006), in her examination of US Blue Ribbon Commission narratives, demonstrates different frames that were used: Space Shuttle Challenger (accident investigation frame), Space Shuttle Columbia (organizational-system failure frame), and 9/11 Commission (historical/war frame with a regulatory failure frame as the causal model).

Importantly, all evaluators confront even unwittingly, the potential for hindsight bias. Generally, it is far more effective to evaluate prior decisions in the context of the time and what they were intended to achieve (including acceptance of some risk) than to evaluate from the comfort of the “future” and write of matters such as “inevitable failures” and “accidents waiting to happen” (Boin & Fischbacher-Smith, 2011). Broadly speaking, there are a number of potential subnarratives that must be weaved together into a coherent story (while limiting hindsight bias). They can include causes, motivations, accountability, blame, response, and lessons to be learned, as well as issues of luck, fate, and inevitability (Boudes & Laroche, 2009; Bovens & ‘t Hart, 1996; McConnell, 2016).

We can adapt Boudes and Laroche (2009), who illustrate different potential plots around who or what is to blame, and turn around the narratives to issues of success and failure. Therefore

Fate plot: what happened was going to happen anyway (e.g., the response was bound to be a failure because the crisis was overwhelming).

Human factors plot: the response was successful because of the actions of specific individuals (e.g., the head of the fire service, the leader of the crisis team).

Bureaucratic hydra plot: the response was a failure because of bureaucratic weaknesses (e.g., poor communication systems, distorted organizational priorities).

System plot: the combination of individual agents and bureaucracy are the main explanations for the success/failure.

Brown (2004) goes further and argues that inquiry narratives impose a particular version of reality, which depoliticizes disaster events and reinforces the legitimacy of social institutions. Snider (2004) reinforces this argument by suggesting that the two main forms of knowledge/power at the heart of inquiries and their narratives—law and expertise—convey the impression of being able to dispassionately resolve ambiguities and uncertainties by cultivating the impression that they are “above” politics.

Beyond the story per se, there is also the issue of whether the report is available to the public. Often, we read of evaluation reports that are for “internal” purposes and not available for wider dissemination. One reason may be for security reasons, such as the need to maintain confidentiality in regard to intelligence-gathering methods or police operational procedures. Other reasons may be more political, in the sense that the evaluations are kept away from the public eye in order to minimize the potential for controversy and reputational damage. After the shooting of young Brazilian John Charles de Menezes (who was misidentified as a terrorist) at London’s Stockwell tube station on July 23, 2005, the event was described by the Metropolitan Police as a “tragedy.” The years that followed produced ongoing conflict between the Independent Police Complaints Commission, the Metropolitan Police, and the family of Menezes. A key issue was the Metropolitan Police and their unwillingness to release an internal review. Such withholding strategies may provide some protection to the organizations involved, but they are not risk-free. In the London Stockwell station case, the ongoing story of the case was a public relations disaster (Greer & McLaughlin, 2011).


Evaluating the success and/or failure of a crisis management initiative is at heart a political activity. It is large P politics, often because the political agenda can be shaped for better or worse by political leaders prepared to dampen criticisms that may emerge in order to protect reputations, policy and institutional trajectories, and core governing values. What may constitute political success for the evaluation does not necessarily constitute success in evaluating how the crisis was managed locally (McConnell, 2011). Yet evaluations are also small p politics, because they are a means of seeking to resolve—through inquiry and narrative—complex and conflicting evidence/views over key aspects of the crisis and the way it was addressed. Crisis evaluation can have huge consequences, as was stated at the outset of this article, and we should remember that potential consequences can shape the means, modes, and direction of inquiry.


  • Australian Bureau of Statistics. (2009–2010). The global financial crisis and its impact on Australia. Canberrra: Australian Bureau of Statistics.
  • BBC News. (2018, September 12). Trump’s claim of success in Puerto Rico hurricane response derided. London, UK: BBC News.
  • Bell, S., & Hindmoor, A. (2015). Masters of the universe, slaves of the market. Cambridge, MA: Harvard University Press.
  • Birkland, T. A. (2007). Lessons of disaster: Policy change after catastrophic events. Washington, DC: Georgetown University Press.
  • Boin, A., & Fischbacher-Smith, D. (2011). The importance of failure theories in assessing crisis management: The Columbia Space Shuttle disaster revisited. Policy and Society, 30(2), 77–87.
  • Boin, A., Kuipers, S., & de Jongh, T. (2018). A toxic cloud of smoke: Communication and coordination in a transboundary crisis. In P. Lægreid & L. Rykkja (Eds.), Societal security and crisis management: Government capacity and legitimacy (pp. 133–150). London, UK: Palgrave Macmillan.
  • Boin, A., McConnell, A., & ‘t Hart, P. (2008). (Eds.). Governing after crisis: The politics of investigation, accountability and learning. Cambridge, UK: Cambridge University Press.
  • Boin, A., & ‘t Hart, P. (2010). Organising for effective emergency management: Lessons from research. Australian Journal of Public Administration, 69(4), 357–371.
  • Boudes, T., & Laroche, H. (2009). Taking off the heat: Narrative sensemaking in post-crisis inquiry reports. Organization Studies, 30(4), 377–396.
  • Bovens, M., & ‘t Hart, P. (1996). Understanding policy fiascoes. Brunswick, NJ: Transaction.
  • Bovens, M., ‘t Hart, P., & Kuipers, S. (2006). The politics of policy evaluation. In M. Moran, M. Rein, & R. E. Goodin (Eds.), The Oxford handbook of public policy (pp. 319–335). Oxford, UK: Oxford University Press.
  • Brown, A. D. (2004). Authoritative sensemaking in a public inquiry report. Organization Studies, 25(1), 95–112.
  • Carayannopoulos, G. (2018). Disaster management in Australia: Government coordination in a time of crisis. London, UK: Routledge.
  • Chapman, R. (1973). Commissions in policy-making. In R. Chapman (Ed.), The role of commissions in policy-making (pp. 174–188). London, UK: Allen & Unwin.
  • Chernobyl Forum. (2006). Chernobyl’s legacy: Health, environmental and socio-economic impacts and recommendations to the governments of Belarus, the Russian Federation and Ukraine (2nd rev. ed.). Vienna, Austria: International Atomic Energy Agency.
  • Connolly, J. (2014). Dynamics of change in the aftermath of the 2001 UK foot and mouth crisis: Were lessons learned? Journal of Contingencies and Crisis Management, 22(4), 209–222.
  • Deen, S. (2015). Pakistan 2010 floods: Policy gaps in disaster preparedness and response. International Journal of Disaster Risk Reduction, 12, 341–349.
  • Drabek, T. E. (2010). The human side of disaster. Boca Raton, FL: CRC Press.
  • Drennan, L. T., & McConnell, A. (2007). Risk and crisis management in the public sector. London, UK: Routledge.
  • Drennan, L. T., McConnell, A., & Stark, A. (2015). Risk and crisis management in the public sector (2nd ed.). London, UK: Routledge.
  • Dynes, R. (1970). Organized behavior in disaster. Lexington, MA: D. C. Heath.
  • Dyson, M. E. (2006). Come hell or high water: Hurricane Katrina and the color of disaster. New York, NY: Basic Books.
  • Elliott, D., & McGuinness, M. (2002). Public inquiry: Panacea or placebo? Journal of Contingencies and Crisis Management, 10(1), 14–25.
  • Ellis, R. J. (1994). Presidential lightning rods: The politics of blame avoidance. Lawrence: University Press of Kansas.
  • Eriksson, K., & McConnell, A. (2011). Contingency planning for crisis management: Recipe for success or political fantasy? Policy and Society, 30(2), 89–99.
  • Farber, D. (2018). Response and recovery after Maria: Lessons for disaster law and policy. Berkeley: University of California, Public Law Research.
  • Greer, C., & McLaughlin, E. (2011). “Trial by media”: Policing, the 24–7 news mediasphere and the “politics of outrage.” Theoretical Criminology, 15(1), 23–46.
  • Handmer, J., & Dovers, S. (2013). Handbook of disaster policies and institutions: Improving emergency management and climate change adaptation (2nd ed.). London, UK: Routledge.
  • Hay, C. (2002). Political analysis: A critical introduction. London, UK: Palgrave Macmillan.
  • Jann, W., Jantz, B., Kühne, A., & Schulze-Gabrechten, L. (2018). The flood crisis in Germany 2013. In P. Lægreid & L. Rykkja (Eds.), Societal security and crisis management: Government capacity and legitimacy (pp. 75–93). London, UK: Palgrave Macmillan.
  • Jin, J., & Song, G. (2017). Bureaucratic accountability and disaster response: Why did the Korea Coast Guard fail in its rescue mission during the Sewol ferry accident? Risk, Hazards, & Crisis in Public Policy, 8(3), 220–243.
  • Kean, T. H., & Hamilton, L. H. (2007). Without precedent: The inside story of the 9/11 commission. New York, NY: Vintage.
  • Kitts, K. (2006). Presidential commissions & national security: The politics of damage control. Boulder, CO: Lynne Rienner.
  • Kornelius, S. (2014). Angela Merkel: The chancellor and her world. London, UK: Alma Books.
  • Lægreid, P., & Rykkja, L. H. (Eds.). (2019). Societal security and crisis management: Governance capacity and legitimacy. London, UK: Palgrave Macmillan.
  • Lodge, M. (2019). The 2011 London riots: Civil disorder and government non-responses. In P. Lægreid & L. Rykkja (Eds.), Societal security and crisis management: Government capacity and legitimacy (pp. 187–203). London, UK: Palgrave Macmillan.
  • Lowndes, V., Marsh, D., & Stoker, G. (Eds.). (2018). Theories and methods in political science (4th ed.). London, UK: Palgrave Macmillan.
  • McConnell, A. (2011). Success? Failure? Something in-between? A framework for evaluating crisis management. Policy and Society, 30(2), 63–76.
  • McConnell, A. (2016). A public policy approach to understanding the nature and causes of foreign policy failure. Journal of European Public Policy, 23(5), 667–684.
  • McConnell, A., Gauja, A., & Botterill, L. C. (2008). Policy fiascos, blame management and AWB limited: The Howard government’s escape from the Iraq wheat scandal. Australian Journal of Political Science, 43(4), 599–616.
  • McConnell, A., & Stark, A. (2002). Foot and mouth 2001: The politics of crisis management. Parliamentary Affairs, 55(4), 664–681.
  • Myreberg, G. (2018). The 2015 refugee crisis in Sweden: A coordination challenge. In P. Lægreid & L. Rykkja (Eds.), Societal security and crisis management: Government capacity and legitimacy (pp. 151–168). London, UK: Palgrave Macmillan.
  • O’Donovan, K. (2017). An assessment of aggregate focusing events, disaster experience, and policy change. Risk, Hazards, & Crisis in Public Policy, 8(3), 201–219.
  • Prasser, S. (2006). Royal commissions and public inquiries in Australia. Chatswood, New South Wales: LexisNexis Butterworths.
  • Prasser, S., & Tracey, H. (Eds.). (2014). Royal commissions and public inquiries: Practice and potential. Brisbane, Australia: Connor Court.
  • Pursiainen, C. (2018). The crisis management cycle. London, UK: Routledge.
  • Snider, L. (2004). Resisting neo-liberalism: The poisoned water disaster in Walkerton, Ontario. Social & Legal Studies, 13(2), 265–289.
  • Stanley, N., & Manthorpe, J. (Eds.). (2004). The age of the inquiry: Learning and blaming in health and social care. London, UK: Routledge.
  • Stark, A. (2018). Public inquiries, policy learning, and the threat of future crises. Oxford, UK: Oxford University Press.
  • ‘t Hart, P., & Boin, R. A. (2001). Between crisis and normalcy: The long shadow of post-crisis politics. In U. Rosenthal, R. A. Boin, & L. K. Comfort (Eds.), Managing crises: Threats, dilemmas, opportunities (pp. 28–46). Springfield, IL: Charles C. Thomas.
  • Taylor, I. (2003). Policy on the hoof: The handling of the foot and mouth disease outbreak in the UK 2001. Policy & Politics, 31(4), 535–546.
  • Vaughan, D. (2006). The social shaping of commission reports. Sociological Forum, 21(2), 291–306.
  • Weick, K. E. (1995). Sensemaking in organizations. Thousand Oaks, CA: SAGE.
  • Wildavsky, A. (1987). Speaking truth to power: The art and craft of policy analysis (2nd ed.). New Brunswick, NJ: Transaction.