Method for visualizing risk factors of system failures and its application to ICT systems
Keywords:risk management, system failure model, normal accident theory (NAT), Interaction and Coupling Chart (IC chart), Information and Communication Technology (ICT)
This paper proposes a method for visualizing risk factors of system failures. This method enables us to visualize risk factors and monitor them over time and compare them among systems. This is valuable for promoting system safety and reliability. First we introduce a methodology of holistically defining system failure then introduce our method for quantifying the risk factors of system failures with an interaction and coupling (IC) chart using normal accident theory (Perrow, 1999). Defining system failure is done using system of system failure (SOSF) (Nakamura and Kijima, 2007, 2008b, 2009a) with a meta-system frame called system of system methodology (SOSM) (Jackson, 2003, 2006). System of system failure enables us to understand system failure holistically. The IC chart is used to classify object systems – nuclear power plants, chemical plants, aircraft and air traffic control, ships, dams, nuclear weapons, space missions, and genetic engineering- using interaction (i.e. linear and complex) and coupling (i.e. tight and loose) between the components that constitute such systems. The IC chart (Perrow, 1999) is limited by the subjectivity in classifying target systems. We propose a method for quantitatively measuring risk factors (i.e. objective) from incidents that have occurred over time to complement the current IC chart shortcoming (i.e. subjective). This enables us to understand system features and the effectiveness of countermeasures quantitatively introduced to object systems. There have been several findings with this methodology. It enables us to quantify the risk factors in terms of the IC chart. Stock exchange, meteorological, and healthcare systems are located sequentially from linear to complex interaction and tight to loose coupling. Intel Architecture (IA) servers’ quality control measures (i.e. educating engineers for becoming hybrid engineers and altering server design goal) cause a shift in the linear interaction and tight coupling directions with less incident rates. Healthcare systems are migrating toward the complex interaction and loose coupling direction with deteriorating system quality. The Electronic Data Interchange (EDI) policy in the healthcare sector is one of the reasons for this migration. Application examples in information and communication technologies (ICT) engineering demonstrated that using the proposed method to quantitatively monitor risk factors will help improve safety and quality of various object systems.