top of page
Top of Page

Bench-Marking Exercise

My work on 'how to rate recommendations’ needs to be seen as being just part of the wider systemic issue of organisational learning. Within this field, I concentrate on learning from crises. I therefore have to acknowledge that learning simply about how to improve the quality of recommendations can only contribute to solving the problem of why we fail to learn. Improving recommendations does not therefore provide a 'silver bullet'.


In order to appreciate fully the complexity of the learning process that derives from public inquiries, I have looked for a system against which I can benchmark the process. I have chosen the aircraft safety process. Few would dispute that, over the life of air travel, this mode of public transport has become one of the safest. (Again, I have to accept that this statement is a great simplification. What I am talking about here are systems that affect the public in general. Therefore, as far as air travel is concerned, I am focusing on mass public air travel through commercial airlines.)


To set my benchmark I have chosen one aircrash report at random. This is the

Bureau d’Enquêtes et d’Analyses pour la sécurité de l’aviation civile (referred to as the BEA) report into the crash of Air France flight 447. This accident occurred on 1st June 2009. The report was published in July 2012.  The BEA is the French Civil Aviation Safety Investigation Authority.  To me this report is typical of the reports produced by the air accident investigation community. The BEA produced 41 recommendations which I have put through my analytical process: see here for details of the process.

In the table below I have set out my analysis. As per the process, I have categorised each recommendation in the first column and I have rated each one in the final column.

Pt 1 - Benchmark

Part 2b

In the diagram (Pt2b) below I have depicted the areas on which the inquiry team has focused. As might be expected, the focus is on technical issues and includes standards, capabilities and performance management. It should be noted that over 36% concern Further Reviews needed to close some information gap.

Pt 2b - Benchmark

Part 3

The Part 3 chart shows that all the recommendations are covered by three ratings. These were Good (17.1%), Fair (65.9%) and Weak (17.1%). This demonstrates a consistency in their quality. It should be noted that, due to the way the rating system has been set up, the 36% of the recommendations categorised as Further Review can normally be rated no higher than 'Fair'. This means that over half of the 65.9% rated Fair could have been rated no higher. What is also worth noting is that none of the recommendations were rated as being Poor, Failed, Non or Bad. This speaks to the general quality of the process.

This Pt3 chart can be compared with the couple of similar charts already produced: see here. The main differences that can be seen are [1] the removal of Poor, Failed, Non or Bad ratings and [2] the mode has moved from Poor to Fair.


This examination of the BEA report suggests some characteristics that clearly apply to the community interested in air safety.


  • The approach used by this industry is one which looks to solve the problems revealed by a single event (that is, the accident being investigated) but have a wider application.


  • After many years of trial and error, the community has separated investigating the cause from investigating the criminal liability. Those investigating the cause focus on why the event occurred. This is enhanced by their 'no blame' culture.


  • Just as important as the 'no blame' approach, the investigators seek to find the root cause of issue. This approach is epitomised by the 'Seven whys'.


  • These events are scrutinised by subject matter experts who understand the issues they are examining. These experts will also have a fundamental understanding of the systems and the complex interactions that produce its dynamics.


  • These experts will understand what is already known, or expected to be known, within the industry. They seem to concentrate within their work on building on the sound basis that already exists.


  • If we think of these recommendations in terms of a complex systems model (see here for an example), the failure could be seen to focus on a single node. This results in recommendations where the distances between (the number of steps required: see non-equidistance) the recommendations and the practical outcome are likely to be fewer. This means that they are less likely to be thwarted by the processes and politics needed to the implement them.


  • The focus of the aircrash inquiry can be seen to be on practical issues. Therefore, no matter what category the recommendation falls into, they will always come back to addressing the operational implications for air safety.


  • Aircrash investigations teams do have the advantage of being able to work with a clearly defined analytical framework. The system they are investigating is well structured and it has clearly adopted a consistent rules-based approach. This consistency helps to align the many diverse factors within this overall system.


  • Finally, learning about the technological issue is clearly more readily done than learning about management or governance issues. The reason for this is that they are less emotionally charged. Management and governance issues are often more personal than technological issues. The lesson here would seem to point to the effectiveness of such investigations being inversely proportional to the emotions injected.


In summary, the strength of the air safety learning process is that it seems to have achieved its overall success by building small gains into an already coherent (if not perfect) system. This is done by experts looking for practical solutions.

Pt 3 - Benchmark


Having examined the BEA recommendations, we should consider the difficulties we might have when comparing two systems; in this case those of air safety inquiries and public inquiries. As public inquiries can be used to examine any aspect of public life, here I will focus the discussion around the future COVID inquiry. I would characterise this as looking at the public health system. The issue here is whether such a comparison can be fair. I use the Normal Chaos framework to facilitate the comparison. What I present here is an extremely abbreviated comparison just to illustrate my point.


The first step in this comparison is to consider their structures. Taking the lens of scale, we can compare these two systems at the macro and micro scales within the boundaries of the UK.  That is, while I accept that both systems are influenced by (have interdependencies with) factors that originate outside the UK, I will only consider the system that is enacted within the UK. At the macro scale there are systems that are referred to as the national transport system and the national health service. Each system has a government department and a Secretary of State responsible for it. So far these seem to be comparable. However, in practice the Secretary of State for Health is held far more accountable for individual failures than is the one responsible for transport. At the micro level we might compare a surgeon with a pilot.  Here the difference can be found in the jeopardy each faces. Pilots face the same jeopardy as their passengers do for any mistakes they make; surgeons and other doctors do not face the same jeopardy as their patients. This differential in jeopardy gives pilots a more direct and personal incentive to learn whereas medical professionals do not.


A second step might be to look at patterns. In this case, we might look at the patterns of behaviour exhibited by doctors and pilots. One of the common perceptions of doctors is that they can be very arrogant and individualistic. This was also the view of pilots prior to the Tenerife air crash in 1977. After this moment, the industry spent considerable effort to change their culture. This culture became referred to as a Crew Resource Management. It focused on getting the best out of the team rather than solely relying on the leadership of the aircraft captain. While the health service has moved some way towards this approach, there is still a considerable gap to close.


A second pattern that we might consider is the nature of their failures. When a plane crashes there are multiple deaths that are rarely reduced to an emotional personalisation. The event is treated as a singular tragedy and is rarely politicised. Deaths within the health service system are normal. However, when deaths within the health service are highlighted, they are personalised for emotional effect and politicised. This difference means that there are completely different dynamics within the two systems. While air accidents can be debated in measured terms, this is more difficult when it comes to health matters where the argument quickly becomes emotion driven (energy).    


Even this brief comparison shows that there are significant differences between these two systems. However, having said that, for the purpose of this work, which is about developing an analytical process, I feel that the air safety system does provide, if not a benchmark, a suitable comparator.  Elsewhere I have described some of the weaknesses that I have identified in the public inquiry process. I will take just two examples to illustrate my point. One weakness within one of the public inquiries that I have identified, when it comes to explaining the cause of some public failure, is their tendency to resort to one-reason decision-making. In contrast, air accident investigations use versions of the ‘Seven-Why’ approach to find the root cause of the issue. Secondly, air accident inquiries are led by subject matter experts who have the relevant seat of understanding. On the other hand, public inquiries are led by non-subject matter experts who have been characterised as being ‘pigs looking at watches’; that is, intelligent creatures who do not understand what they are observing. To me therefore this comparison does, despite the differences, raise questions over the efficacy of each approach. This suggests that, even at this level, comparison of two systems would be a useful exercise.


I will now look to develop a proposition that describes the characteristics necessary for an inquiry to increase its probability of producing useful recommendations.

Bench-Marking Discussion

Last updated: 16 Jan 22

bottom of page