Friday, June 1, 2018

Root Cause Failure Analysis: Unknown knowns

Understanding Failure: From Root Causes to the Unconscious Dimension of Engineering Judgment

In a fragile and increasingly complex world, engineers must understand not only how systems perform, but how and why they fail. Failures may arise from structural deficiencies, material degradation, design errors, operational lapses, or a combination of these factors. Regardless of type, category, or discipline, every failure demands a systematic and disciplined investigation aimed at identifying its true root causes.

Failure Cause Characterization

In principle, failure triggers can be categorized into three fundamental domains, collectively referred to as Failure Cause Characterization (Márquez, 2007):

  1. Human causes
    Errors of omission or commission originating from human action or inaction, which ultimately manifest as physical failures.

  2. Physical causes
    The direct technical reasons an asset failed—why components broke, systems malfunctioned, or performance limits were exceeded.

  3. Latent causes
    Deficiencies embedded within management systems, organizational structures, procedures, or governance frameworks that permit human errors to persist unchecked. These are systemic flaws rather than isolated mistakes.

Among these, latent causes are often the most critical—and the most difficult—to identify—because they reside upstream of observable failures.

Asset Management Context and Standards

Within the discipline of asset management, particularly under the ISO 55000 series, failure analysis is inseparable from performance evaluation. Clause 2.5.3.7 of the ISO framework states:

“Asset management performance should be evaluated against whether the asset management objectives have been achieved, and if not, why not. Where applicable, any opportunities that arose from having exceeded the asset management objectives should also be examined, as well as any failure to realize them. The adequacy of the decision-making processes should be examined carefully.”

This requirement explicitly foregrounds the role of latent causes and decision-making adequacy. Consequently, Root Cause Failure Analysis (RCFA) becomes a core process within asset management systems, supported by standardized investigation procedures. Comparable requirements appear in ISO 9000 frameworks, where failure reporting, investigation methodologies, and lessons-learned mechanisms are foundational elements of quality management.

Minimum Contents of a Failure Investigation Report

A failure investigation report, at minimum, should include the following components:

  1. Definition of the investigation team (e.g., subject matter experts, consultants, engineers, operators, technicians).

  2. Recollection and documentation of failure data (problem description, date and time, location, GIS data, operational context).

  3. Evaluation of impacts and immediate corrective actions taken.

  4. Date of the incident report.

  5. Date of the investigation report.

  6. Detailed description of the failed asset.

  7. Identified root cause(s).

  8. Recommendations for corrective and preventive actions.

  9. Supporting attachments (photographs, inspection records, correspondence, early warning indicators, historical data).

  10. Final inspection or witnessing dates for repairs or testing, where applicable.

  11. Lessons learned.

Root Cause Analysis and Validation

Root cause analysis should be conducted in accordance with established methodologies, such as those outlined in BS EN 62740, which defines current best practices for RCA.

Within this process, validation is the most critical and sensitive stage. Validation ensures that the identified root cause genuinely explains the failure mechanism and is suitable for guiding corrective actions. Several validation approaches may be employed:

  • Independent third-party review, providing an objective assessment of the RCA’s rigor and conclusions.

  • Experimental or testing-based validation, demonstrating that the proposed root cause can reproducibly trigger the observed failure.

  • Statistical and probabilistic approaches, including numerical modeling, Monte Carlo simulations, or reliability analyses, particularly in complex or stochastic failure scenarios.

When using simulation-based validation, extreme care must be taken to ensure that models realistically and representatively capture the underlying failure mechanisms.

The Limits of Knowledge and the Role of Intuition

Beyond technical rigor, all investigations operate within the inherent limits of human knowledge. A useful conceptual framework is the following classification:

  • Known knowns
    Established theories, evidence, observations, and facts consciously available to the investigator.

  • Known unknowns
    Recognized limitations: assumptions, approximations, uncertainties, and modeling errors.

  • Unknown unknowns
    Completely unforeseen factors—undocumented third-party interventions, missing evidence, or events leaving no detectable trace.

  • Unknown knowns (Žižek, 2012)
    Knowledge sedimented within the unconscious: accumulated experience, pattern recognition, and intuition that guide judgment without conscious recall.

It is this final category—the unknown knowns—that often plays a decisive role in expert judgment. What appears as “intuition” is frequently the result of years of accumulated experience, silently processed and stored beyond conscious awareness.

The Unconscious in Engineering Judgment

Beyond conscious analytical capacity, the unconscious continuously supports decision-making processes. Initial intuitions are not irrational impulses; they are often rapid, automated syntheses of deeply internalized knowledge. In psychoanalytic terms, the unconscious exerts an agency that operates without our direct control, yet profoundly influences our actions.

Maintaining a well-informed unconscious therefore becomes an epistemic responsibility. Reading, observing, listening, practicing, reflecting, and even idle contemplation feed this hidden reservoir. In psychoanalytic discourse, dreams represent one mode of unconscious communication; moments of insight often emerge when attention is diverted from deliberate problem-solving—during rest, daydreaming, or disengagement.

The domain of unknown knowns—knowledge that we possess without consciously knowing that we possess it—frequently governs expert decisions, particularly when empirical evidence is incomplete or inaccessible. Many expert judgments cannot be conclusively proven, yet they remain valid because they arise from this deeply internalized structure of experience.

In this sense, intuition is not the opposite of rigor; it is its long-term byproduct.


References

Márquez, A. C. (2007). Root Cause Failure Analysis (RCFA) for High Impact Weak Points. In The Maintenance Management Framework: Models and Methods for Complex Systems Maintenance (pp. 127–132). London: Springer.

Žižek, S. (2012). The Limits of Hegel. In Less Than Nothing: Hegel and the Shadow of Dialectical Materialism (p. 358). London: Verso.

0 කුළිය: