Exploiting causal inter-dependencies in the process manufacturing industrial data

Advances in data science are revolutionising several aspects of life including the operational management of manufacturing plants. Industrial analytics offers the scope of improving product quality, plant productivity and efficiency by adapting various data science developments including machine learning for mining the plant data and extracting information and insights from it. It was long recognised that there is ample scope for improving the operational management of process manufacturing plants and the technology for realising this is rapidly maturing in recent years. Some of the important requirements of achieving operational excellence in process manufacturing industry are continuous and proactive monitoring, diagnosis of the root causes of alarms, preventive maintenance of degraded units, trouble shooting causes of production bottlenecks in order to improve productivity and yield, abnormal situation management, identifying inefficient control loops and their causes, pinpointing the root cause of plan-wide oscillations and predicting the evolution of important plant KPI’s enabling a proactive approach to mitigate undesirable events.

Manufacturing process industry is one of the forerunners in adapting control and automation technologies. These industries are highly suited for automation due to the nature of the their continuous or batch type operation and introducing automation can result in good return on investment. Several uni-variate and multivariate statistical methods were applied to data in process industries for various purposes like monitoring, root cause diagnosis, process improvement, process control, data reconciliation etc. even before data science became a catchphrase.

During the early years of artificial intelligence, there have been attempts to design expert systems for decision support in process operation management. Even though these approaches have fallen out of favour, it cannot be rejected that the human factor is absolutely essential for teasing out actionable insights from the results of data algorithms. Any amount of data or any of the newest developments in algorithms cannot make the human expert knowledge about the process obsolete. Massive amounts of data that can be generated in today’s process plants or any of the latest developments in algorithms cannot be considered as a magic wand that can give us all the answers. The human factor is not trivial and we still have to rely on subject matter experts to provide the context for interpreting data analysis results.

One of the crucial elements of expert knowledge is the information on causality. Establishing causality between two variables A and B involves three factors, especially in an engineering context. These are; establishing correlation between these variables, establishing the time sequence between these variables (A occurred before B or vice versa) and establishing an explanation model that can answer why and how one variable influenced the other. In the context of manufacturing processes, the causality information can be derived from piping and instrumentation diagrams, operator experience, physical and chemical principles or even from model simulations. The knowledge of causality combined with other data analysis algorithms provides a means to develop an automated decision support system for the process plant. Causal models allow interpretation of the data and provide explanatory models that are very important for human intuitive understanding. Causal knowledge is also key for achieving process improvement by revealing the causes of issues and can be used to predict the consequences of disturbances and their propagation through out the plant. Some of the advantages of capturing and utilising the qualitative knowledge in a process plant include the ability to automate diagnostic reasoning and present analysis results in an intuitive and human relatable form, reduce time spent on trouble shooting and diagnosis, and reduce the dependency on human experts who may not be always available.

Traditionally, alarms based on threshold violation are the means to indicate the status of various units and systems to the technicians. As the complexity of plant and the number alarms increased, it was quickly realised that this approach leads to other complications as operators were flooded with alarms and failed to give their timely attention to the root causes of the abnormal events. This gave rise to the need for rationalising alarm floods and formulating strategies for alarm management, which is generally performed through offline analysis of historical alarm data. While alarm rationalisation provided some relief to the operators, expert input was still required to drill down to the exact root causes consuming additional time and resources.

Attempts to automate this root cause determination using event based fault tree analysis have been incorporated in some decision support systems. This requires establishing causalities and tracing events that are observable symptoms right up to the known failure causes. While fault trees can capture some knowledge of causalities, they have some cons including construction using ad-hoc methods rather than reflecting fundamental principles, inability to reason unrepresented faults, rigidity or lack of robustness and excessive time and effort required for their construction. Due to these reasons, automated reasoning for root causes using fault trees are not very popular and root cause analysis is generally performed manually costing much time and effort.

Recent developments in data science like advanced supervised and unsupervised machine learning algorithms have enhanced the abilities for pattern recognition, classification, regression and clustering. These developments have endowed more tools that can be exploited for process industrial operational decision support. For example, we can build regression models for generating residuals for monitoring, use powerful algorithms for fault classification, utilise pattern recognition for fault identification, build predictive models for predictive analytics and much more. While, there is no doubt that these developments can greatly improve operational excellence in process industries, exploiting the causal interconnections in the data still remains crucial in providing context to the analyses and enabling automation of root cause reasoning.

The best method for capturing the knowledge of causal inter-dependencies in the data is in the form of graphs. Graphs are a great aid in intuitively visualising connections in various fields of human endeavour. Causal directed graphs as a means of capturing the qualitative interconnections in the plant received much attention in early years. In recent years they are largely ignored in favour of more ah-hoc approaches owing primarily to the rigour and effort required for their construction.

The major pain point in capturing the knowledge of causal dependencies in the form of graph is the construction itself. Although the graphs are constructed from expert knowledge, it would be greatly beneficial to validate interconnections and explore other causal possibilities using available historical data. Also, establishing causality is not easy for complex interactions. Methods like the cross correlation analysis and transfer entropy can help in this process of validation and causal exploration aided by user friendly interfaces for investigating specific causalities. Since graphs for large plants can be huge and cumbersome to work with, modular ways of working should be enabled. An object oriented approach will allow creation and reuse of blocks for commonly repeating units. One more disadvantage of the causal graphs is the rigidity of the structure (inability to change structure according to changes in operation). This drawback can be tackled by event or condition based dynamic triggering to effect changes to the graph structure. Event based dynamic triggers can include logic’s to mimic the functionality of fault trees further extending the capability of causal graphs and incorporate operator experience. I believe that these enhancements can make the causal graph method user friendly and extend its utility. Combined with other data analysis algorithms, causal knowledge based reasoning can not only lead to truly automated reasoning, but also provide interpretable and intuitive results that can be easily actioned by operators.

A number of methods (including data based machine learning, qualitative and quantitative model based and statistics based) can be employed for process manufacturing operational decision support and it has been widely acknowledged that hybrid schemes offer better solutions than any single method. This is because each method has its own advantages while suffering from some drawbacks. A specifically tailored hybrid strategy designed by picking and choosing methods to form customised dashboards that can be deployed online will provide vastly superior accuracy, precision, robustness and explanation facility. Causal reasoning would be a key component in such a hybrid strategy.

The hybrid scheme can be designed utilising various options ranging from uni-variate and multivariate statistical methods, statistical process control methods, machine learning model based regression and classification methods and causal graph based reasoning combined with methods like cross correlation analysis and transfer entropy. As an example, units that are critical and must be continuously monitored for their health status can be good candidates for the use of machine learning based models trained using ideal operation data. Their residuals can then be monitored for deviations catching incipient changes in their health status.

While online monitoring and diagnostics are important, offline processing of data and derivation of process insights from them would also be a crucial element in a good decision support system. Data preparation for various model building and monitoring purposes would also be an important function of the offline system. Exhaustive searching and exploring, manipulation, aggregation, cause and consequence analysis, frequency analysis, data manipulation and extraction are some of the options that such an offline system should provide.

Specific focus is also required for dealing with control loops in process industrial plants that can under-perform resulting in productivity losses. Control loops are often root causes for oscillations that can propagate throughout the plant. Therefore, special attention in monitoring the control loop performance using various metrics will be required for effectiveness of the overall operations management strategy. Methods for drilling down issues in control loops and pinpointing the control loops originating the plant wide oscillations will be important. Further, monitoring of the final control elements like control valves that can be affected by issues such as stiction is a necessity. Diagnosis of control loops can involve methods based on higher order statistics for accurate determination of failure causes.

In conclusion, capturing the knowledge of the causal inter dependencies in process manufacturing data and utilising it for the purposes of online diagnosis and root cause analysis can offer immense benefits to process industry in terms of savings in time, manpower and cost by enabling proactive monitoring, automating root cause diagnostics, scheduling timely preventive maintenance, identifying production bottlenecks and enhancing plant safety. Process industries should embrace these data analytics possibilities (through products and services offered by our company DV-IndAn) enabled by the latest research and developments in data science to realise huge benefits. Since these technologies only involve analysing the large amounts of data already logged in modern plants, their implementation will only incur relatively small costs compared to the benefits over extended periods of time resulting in a good return on investment thus making financial sense.