Alto42
Understanding Organisational Failure
Grenfell Inquiry - Review of Part 1
My interest in the public inquiry, under the Chairmanship of The Rt Hon Sir Martin Moore-Bick into the fire at Grenfell Tower on 14 June 2017, is the lessons it suggests we, as a society should learn from these events. My concern is that, while such inquiries may be the right way to judge the lawfulness of the decisions made, my previous research suggests that they are not an effective way of learning how to manage fast moving, complex and dynamic situations. In fact, in some cases such reports may do more harm than good.
When I read such reports what I am trying to identify is the mechanism of failure and the justification for the action proposed in the report's recommendations. From this I try to assess the probability that the action proposed will address the causal mechanism and will not generate unintended adverse consequences. Finally I consider the cost effectiveness of the proposed action.
As I criticise the blinkered thinking of others, let me declare my own bias. As a former operational commander my sympathies lie with the operational teams. This is not to say that they can do no wrong, it does however make me try to understand why they erred. In turn this means that I am looking for solutions that work not only in theory but also in practice. In technical terms I focus on Practical rather the Scientific utility of learning. This theme is at the heart of all of my work. As for this case, I have no connections with any parties to the inquiry and so do not have a stake in the outcome.
With these criteria in mind, I summarise below my assessment of the report produced on The Rt Hon Sir Martin Moore-Bick into the Grenfell Tower fire. My full paper can be read here.
Executive Summary
The report states that in the early hours of 14 June 2017 a fire broke out in a domestic appliance and this caused a "perfectly foreseeable" kitchen fire. The London Fire Brigade (LFB) responded promptly and yet the fire escalated rapidly. In just over half an hour the domestic fire had become a major incident that stretched the LFB to breaking point. While 227 residents managed to escape the building, 71 did not. In order to learn from these events, the UK government set up a public inquiry under the Chairmanship of The Rt Hon Sir Martin Moore-Bick to examine the circumstances surrounding the fire. My paper looks to examine the conduct of part 1 of that Inquiry as a vehicle for social learning. In particular, this paper focusses on how or whether such inquiries assist fire and rescue services to learn from such events.
The approach taken to this research positions it within the concepts of Scholarship of Application and that of Engaged Scholarship. The issue this paper addresses is how we make sense of our experiences in order to learn from them. The paper describes the use of two alternative analytical approaches; the first of these is in common usage (labelled the perfect world paradigm) and the second has been labelled normal chaos. The paper uses the Grenfell Inquiry Report (part 1) as a case study.
The basic proposition that underlies the perfect world paradigm is that if organisations recruit the perfect people, produce perfect plans, train them perfectly, supply them with exactly the right resources (including perfect unambiguous information) and execute the plan flawlessly (eliminating all slips and lapses) then the desired outcome will be delivered. Within this paradigm is the belief that individuals should be able to learn, retain and use the knowledge they require perfectly. Embedded within this construct is the desire to remove uncertainty and to control the world around us. The label perfect world paradigm is used to reflect the phrase often heard when discussing failure; that is “but in a perfect world …”. The paper then substantiates its assertion that the Inquiry team’s worldview is routed within this paradigm. In terms of lay theory, the perfect world paradigm can be seen as being a normative theory. The key and pertinent criticism of this paradigm is that it is an inadequate precept when used to grasp and understand the true complexity of everyday actions.
The paper offers an alternative analytical precept. This precept has been labelled the normal chaos paradigm. It is based on the key facets of complexity and chaos theory. These ideas have been assembled in a framework which gives analysts a different lens through which to view an issue and thereby make sense of it. In terms of lay theory, the normal chaos paradigm is a descriptive theory that enables the analyst to describe an issue in terms of a standardised metaphor provided by complexity and chaos theory. These lenses are grouped into structures, patterns and energy. Under structures analysts should consider scale, interdependencies and self-organisation. Under patterns, the analysts should consider the fitness landscape, fractals and the role of illusions. Under energy, the analysts should consider energy flow, attractors and the edge of chaos. Theory suggests that the triangulation achieved by looking at a single issue through multiple lenses, helps us to make sense of complex issues. This is at the heart of the normal chaos approach.
The paper examines the dynamics of the Inquiry, the substance of the Report, the way it structures it recommendations, the recommendations themselves and the Report's perception of what failed. The paper goes on to look at the implications for learning of some features of the perfect world paradigm, namely its reliance on rules, the clash of cultures that this created between the Inquiry team and the practitioners and how this affected keeping residents safe, and decision support systems including the decision to revoke the stay put policy on that occasion.
Finally, the paper provides a summary of its findings, a description of the perceived limitations of this research and its conclusions and recommendations for future research. In summary these are as follows:
As human populations move to more high-rise living, the questions raised by the Grenfell fire become an important topic for research. The desire to learn is paramount amongst fire-fighters. If the learning process was easy, then the fire-service would have perfected it by now. It is not easy and so they must continue their efforts. An important part of this process is the inquiry process.
The report produced by Moore-Bick into the events on the night of the fire is probably (within an acceptable margin of error) an accurate narrative of those events. The data they collected overall is likely to provide data for a wide range of future research. The value of this Inquiry is that it recognised the need to let the victims be heard as this process helps them to understand what happened on the night and so might set them on the path of coming to terms with their loss. Where this paper diverges from the findings of that report is in its analysis of the events and the recommendations produced.
The Inquiry Report shows clear evidence that the Inquiry team see the world as they would like it to be (a perfect world) as opposed to how it actually is (complex and messy). By way of example, this paradigm is encapsulated in the statement “It should be a simple matter for the owners or managers of high-rise buildings to provide their local fire and rescue services with current versions of such plans” [emphasis added]. In itself, as a single action, the Report is right that such an action should be relatively simple. However, it is a mistake to see any single action as an isolated event. Every action takes place within a context. The context adds complexity and complexity adds dilemmas (often in the form of conflicting priorities). In short, no action should be seen as being “a simple matter”: every action needs to be seen as being part of a chain of events that interact with other such chains. This is a clear divergence in perspectives between the Inquiry Report and this paper.
The reason for this divergence is the different paradigms used. The perfect world paradigm used by the Inquiry team is, at its simplest, looking towards delivering a perfect system; this worldview predominates management thinking. From studying the pertinent academic literature on organisational failure, it is reasonable to draw the opinion that such perfect systems are fantasies (illusions). They are impossible to produce and equally impossible to operate perfectly due to the innate complexity of life. An alternative paradigm is therefore required.
This paper offers an alternative normal chaos paradigm through which to examine these issues. This paradigm centres around the complexity of everyday interactions. It also focuses on the limited human ability and capacity to make sense of their world and then to act in the most appropriate manner. Normal chaos emphasises non-linearity in contrast to the linearity of the perfect world paradigm. Normal chaos recognises that systems will never be perfect and that humans are prone to slips, lapses and errors, and that misjudgements and mistakes are an unavoidable part of everyday life. The question that is at the heart of the normal chaos paradigm is, how can we best cope in such circumstances? To this end, normal chaos research looks to the work being undertaken to develop robust and resilient systems as being the way forward.
In order to enhance the learning from the activities of the emergency services in the long term, the international body of firefighters should consider developing an accident investigation body (modelled on the aircraft industry) to learn from the body of experience from all major incidents and to develop robust procedures to improve the way they are handled in the future.
This way forward should also be supported by a suitable programme of research.
Last update:09 Oct 20
The result of my Part 1 Analysis on the Moore-Bick recommendations should the areas of the system on which he focused.
Last update:12 Nov 21
HMICFRS Response
In February 2021 the HMICFRS produced London Fire Brigade Inspection of the London Fire Brigade’s progress to implement the recommendations from the Grenfell Tower Inquiry’s Phase 1 report. I considered that it might be useful to examine this report in order to see how the recommendation where being turned into practice. In this paper I reconfigure Turner's Disaster Incubation Theory (DIT) to optimise it for learning. I then use the reconfigured DIT to give structure to my analysis of the HMICFRS report.
The aim of this paper is to show how by reconfiguring DIT, it can be used to examining the learning contained within a formal report.
My detailed thoughts can be found here:
It must be noted here that my paper was shared with the HMICFRS; it did not received a favorably response! The HMICFRS were given several opportunities to correct factual errors; They did not offer any. I can only presumed that, while I was factually correct, my comments strayed outside of their Overton Window.
Summary
The use of the Grenfell Tower fire case provides a graphic illustration of what would otherwise be a rather dry theoretical model. The first issue we see is tension between doing what would be ideal (in order to produce a perfect system) and doing what is pragmatic. In my used of the Grenfell fire, I have identified some additional concerns in the way the process can be conducted. I feel that it is important to emphasise that the criticisms offered here are seen as being generic to the processes that surround Public Inquiries and it just happens that the Grenfell case illustrates the points nicely. As I have stated previously, but it is worth repeating, I picked this case for no other reason than it was current at the time of writing. The fact that, by coincidence, it illustrates my wider points should simply be seen as reinforcing the prevalence of these issues.
In the case of the Grenfell fire, the Overton Window (the limits of acceptable political discourse) is clear; The acceptable narrative is that blame lay with the LFB and that it was imperative to restore the public trust in this Service. To this end, the authorities must be seen to be implementing Moore-Bick’s recommendations whether they are flawed or not: no questioning of the recommendations was acceptable. Here we see that the HMICFRS was faced with a dilemma. Terms of Reference from the Home Office for this specific piece of work tasked the HMICFRS only to ensure “the brigade w(as) effective at ensuring progress against the action plan for delivering the recommendations”. It did not require them to assess the validity of the recommendations. However, on their website (see the page “What we do”), it is stated that they use “experienced officers and other subject-matter experts to identify the best practice from which all forces and FRSs can learn to improve their performance”. This suggests that they should be aware of the need for double-loop learning and therefore be aware of the need to conduct a re-set as per Stage 1a of DIT. It is clear in this case which direction they followed.
The HMICFRS report reads as a careful navigation between responding to Moore-Bick’s criticism and being supportive of the LFB at this difficult time in their history. The report is very pragmatic and can be seen to have given the force a way forward. Another important function of the report is for it to try to win back the public’s trust in the LFB. While these two issues are both pragmatic and understandable in the circumstances, I have to question whether this approach is optimal in ensuring that such events “never happen again” either within the LFB or elsewhere within the country. Some may consider it to be unfair to set the standard at optimal learning: this standard is taken from the HMICFRS report that critiqued the LFB for failing to optimise the project management of their implementation programme where, to me, project management issues are secondary to those of service delivery. If the HMICFRS expects optimisation of project management, they should also expect optimising of the learning of operational lessons.
In my analysis of the Moore-Bick Inquiry I emphasised how the paradigm used (that is the view taken as to how the world works) affects the conclusion reached. In essence the issue is whether the system can be perfected (perfect world paradigm) or whether the system can never be perfected and so needs to be ready and able to adapt dynamically to the circumstances encountered. It can be seen from their report that the Moore-Bick inquiry formulated their recommendations using the perfect world paradigm and this was not questioned by the HMICFRS. I have noted that these actions are representative of managers and executives within the UK and is consistent with advice I received during my doctoral studies. When discussing how to conduct research within industry we were warned of stereotypes. We were told that if we suggested a novel approach to an American, they would be likely to respond positively and seek to see if the idea gave them a commercial advantage: if you offered the same idea to the British, the likely response would be more defensive and your suggestion would be taken as a criticism of what they currently do. I have noted that, since shifting the focus of my research from the Low Countries to the UK over a year ago, I can confirm this stereotype to be true. I have now to question whether this is a significant factor in why the UK fails to learn from the past.
Within the theoretical literature there are pointers to why British practitioners take this approach. To begin with the British pride themselves on being pragmatic problem solvers. As such they have been shown to a have a tendency to “rush to do” as can be seen in the project management world stated above. Linked to “rush to do” is ETTO. The pragmatist trades thoroughness for efficiency. This is consistent with the concept of sufficing (where decisions are only required to be “good enough”). This is then coupled with the concept of ‘muddling through’ (this is a technical term rather than just a caustic aside: see my Glossary page for its definition) where the next decision is built on the last one rather than on a fundamental re-evaluation of the issue as should be carried out at DIT Stage 1a. In the pragmatic world of management, it is understandable why managers and executives are too busy to think about how they think. However, in the case of learning from crisis, this approach leaves gaps through which the next crisis will emerge. Therefore, it is an illusion to think that the use of the perfect world will go anywhere to ensuring “such problems will never happen again”.
It is clear that Managers and Executives are content to live with the mistaken belief that their current approach, “good enough”, is consistent with their use of the perfect world paradigm. While it is clear that many understand that the world is far more complex than the mental model they use to manage it. This approach can immediately be seen to be at odds with Ashby’s ‘Law of requisite variety’ that suggests that it takes a complex system to manage a complex system. This Approach therefore creates gaps within which future crises can incubate. The question therefore becomes one of why they are prepared to accept this illusion. Is this driven by the temporal nature of the risk (the risk is long-term, while their tenure is short-term) or by some other factors? This is a question for a separate essay.
In his report Moore-Bick criticised the LFB for not planning for the high impact, low probability events. This criticism is consistent with his use of the Perfect World paradigm. The sacking of the Danny Cotton would seem to put this failure of risk management down to poor leadership. From my point of view however, the LFB failure in this area fits a wider pattern of behaviour. Many of the inquiry reports that I have read state the organisation in question failed to manage some highly unlikely risk. I believe that it represents a more systemic problem. I believe that it is important to understand the reasons at the heart of this problem.
-
The first issue is that the standard risk management processes encourage the failure to consider high impact very low probability events. Where this failure occurs it is labelled within the theoretical literature as a ‘. In the case of the Grenfell inquiry they used the premise of the “Shuttle hitting the Shard”. The probability of this happening is infinitesimal and should rightly be discounted as a scenario; if it is not, then we have to think of what other obscure scenarios should also be included. There is therefore a need to think more generically. A more generic idea would be any aircraft approaching a London airport over the city. The number of aircraft that land at Heathrow is around 480,000 per year. The number of aircraft that land at London City is around 80,000 per year. If we say half of these approach and take-off using routing that goes over the city, then this would amount to 280,000 per year. A general standard for a risk being acceptable is Sigma 6. This equates to approximately 1 event in 1 million or equates to 1 crash every 4 years. One would seem to be due! It is therefore fortunate that such failures are in fact far rarer than this, but this takes them far outside those risk considered by the risk management process. While such failures of imagination are easy to see with hindsight, the question for the future is where will the line now be drawn by the LFB? Is it at Sigma 6 or events far rarer than that? This should have been a discussion at Stage 1a. There is nothing in the HMICFRS report to suggest that this debate happened.
-
Secondly, and linked, there is the In brief, this concept considers how the repeated absence of evidence leads decision-makers to believe that the phenomenon does not exist or, at a minimum, is not worthy of further consideration. As the Shuttle had not crashed into the Shard and considering how rarely a plane has crashed into a built-up area, this may have induced the LFB to think that the risk was not worth further consideration. The specifics of the case are not the issue here. The issue is to prompt all organisations to reconsider which risks they have discounted, to distil them from a specific to a generic (in this case the specific, the shuttle hitting the Shard, become the generic of mass casualty search and rescue in a high threat environment) so that a generic response can be devised.
-
The problem of induction then leads to the limitations imposed by the cage of expectations. Prior to the events at Grenfell the leader of the LFB and the members of their funding body would have limited their expectations to what could be achieved within the anticipated resource limits. These, probably sub-conscious, constraints, will have limited the scope of their thinking for what they should be prepared to handle. In this case, the HMICFRS report states £7.7m of additional resources have been allocated to the LFB subsequent to the events at Grenfell.
-
The cage of expectations also effected a second failure of imagination. This is where the scope of the scenarios envisaged and the related practices are limited to those with which the organisation thinks it could cope. This seems to be the case with the UK Government limiting its pandemic preparation to flu rather than “Virus X”. It may or may not have been a factor at Grenfell but, as this idea was not explored, this remains unclear. This behaviour is consistent with a human tendency to be for the sake of the psychological comfort of individuals within organisation. This is where the use of a model, such as DIT Stage 1a, can help organisations to see the flaws and gaps within their analytical processes so that they, at a minimum, might ask themselves why they are limiting their analysis in the way that they are.
-
Finally, while poor leadership may have been the cause (and whether this occurred is outside the scope of this essay), bureaucratic drift, as an important pressure on decision-making, cannot be ruled out. A common pattern of activity that develops during Stage 2 (the incubation period) is that initial good intentions are systematically undermined by everyday resource pressures. As issues slip from the short-term organisational memory, the issues identified receive lower priority for the allocation of resources (such as time). There is nothing in the HMICFRS report to suggest that debate occurred on how this drift might be avoided in the future.
Within the Perfect World Paradigm, it is common to see all organisational failures as a failure of leadership. I would see the decision to remove the LFB’s Chief Fire Officer as being rooted in the “omnipotent leader” component within the Perfect World Paradigm. Changing the leader is a common “go to” fix employed in these circumstances. The fact that Danny Cotton was only confirmed in her post on the day of the Grenfell fire would suggest another serious organisational failure. If she was not fit for role, why was she selected and had so recently been confirmed in the role? This would suggest a major failure in the organisation’s selection process, this issue was raised in neither the Moore-Bick nor in the HMICFRS report. Again, the specifics of this case are only illustrative of the point that concerns the ‘seat of understanding’ of those conducting these investigation as they did not conduct a multi-level analysis of the issues raised. Here again the proper use of analytical models might have prompted these questions. The need to conduct multi-level analysis is best described by Leveson in her STAMP model: Scott Snook describes the implementation and implication of such analysis in his very readable book “Friendly Fire”. Within the Normal Chaos model, the same issue is embraced by the concept of scale. Whatever method the analyst uses to prompt their thinking, it is unlikely that they will get to the root cause of an issue if they do not do so.
So, what was the nature of the crisis? The precipitating event (DIT Stage 3) was the fire at Grenfell Tower. This resulted in a high tempo operation that stretched the organisation to the limits of (and maybe even beyond) its capability (Stages 4 and 5). Within the definition of an organisational crisis (defined as events that threaten the existence of the organisation or the tenure of the executive team), we can see that the Grenfell fire did create an organisational crisis for the London Fire Brigade. While it did not threaten the existence of the organisation, it did lead to the removal of the Chief Fire Officer. In my view, the LFB failed to recognise the true character of the organisational crisis and so failed to manage it effectively; for example, the complete lack of effective crisis communications during Stages 6, 1 and 2. For the HMICFRS, the nature of the crisis appears to have been clearer. Their efforts to reassure the public with both the report and interviews on the day it was published seem to have provided small steps in their crisis communications plan. However, neither seem to have bridged the gap between them and the public perception of the crisis; in terms of the public, the crisis was the fire. The gap between these two worldviews offers an example of why consideration of this issue is import in crisis management.
Experience shows that in the immediate aftermath of a crisis, organisations are alert to the issues raised and are active in preventing them. The point highlighted by DIT Stage 2 (the incubation period) is that, over time, the measures taken become diluted and so opportunities for the next crisis emerge. The organisation’s guard against this drift are their internal and external quality assurance systems. It is therefore important, if such events are never to happen again, to ensure the appropriate assurance systems are in place, properly focused and heeded. It is not possible to know from these reports whether the LFB is typical of fire services across the UK. If it is not, then the question becomes one of why did the HMICFRS not prevent the LFB operational standards from falling below the expected standard. This case highlights the role of quality assurance during DIT Stage 2. In crisis prevention, the generic question raised for organisations is how they determine where they have gaps in the quality assurance systems that may lead them to future failures. In an imperfect system, the question becomes what efforts are being made to spot these potential points of failure and how to cope should they occur.
Last updated 10 Nov 21