Process mining generates data that can be used for machine learning
A first necessary step for the generation of ML models in the process environment is the extraction of relevant process information from the available data. For this purpose, process mining methods are used for data-driven process analysis and modeling. The information and insights gained that way must then be used sensibly as input variables for ML models in order to use these models to uncover the relationships between process steps and the other process parameters and variables recorded.
The overall goal is to explore methods that can ideally generate directly interpretable analyses and forecasts while being able to process a high complexity of the input variables and process information to be integrated. Here, the focus is not only on the use of white-box models but also on the comprehensible communication of forecasts from black-box models such as deep neural networks. A large part of the research is also devoted to the meaningful and compact formalization of the information extracted from the process data.
From process data to Machine Learning to anticipatory support systems
The basis of the research field of process analysis is data in a special form, so-called event logs. These differ from conventional data structures such as cross-sectional or time series data in that there are usually irregularly distributed data points in the form of executed activities. This makes it difficult to apply classic analysis and forecasting algorithms, but on the other hand it offers the opportunity to use methods such as process mining to extract process knowledge available on the data side.
This process knowledge traditionally contains information on the chronology of various process steps, on certain patterns in the (sub-)process flow, or on the resources used in the process, such as equipment or personnel, which have an impact on target variables and key figures of the process to be analyzed. The extracted information is then integrated into explainable ML models in order to ensure that the developed method is always comprehensible to users. Methods that fulfill these criteria and are therefore increasingly used in process-aware learning are, for example, Bayesian networks, Markov models or decision trees.
Process-aware learning in applications
The topic of analysis in process-aware learning can be process key performance indicators such as throughput times or defect rates and their influencing factors of production processes. Also complete processes and their components ("activities") as well as anomalies in processes or bottlenecks can be predicted. In this way, uncertainties in processes can be made tangible, for example, for driver assistance systems in rail transport. The data-driven forecasts and suggestions generated by ML models can serve users as a support system for process planning and control.
Analysis frameworks for the automated integration of process knowledge into process prediction models
The integration of process information into explainable machine learning models is generally associated with a high conceptual effort. Therefore, additional efforts of the competence pillar deal with the realization of an automated algorithmic generation of interpretable models for process prognosis and the thereby possible predictive support of process planning and control. In cooperation with the Ludwig-Maximilians-University Munich, methods are developed to automatically learn causal network structures from process data. Various disciplines from the field of process mining, in particular the data-driven creation of process models ("process discovery"), will be analyzed for their ability to extract causal relationships between process steps and other process parameters from the process data. The added value of such a procedure is an enormous reduction in the manual effort required to convert process data into usable analysis models while preserving the comprehensibility and interpretability of the forecasts and model-generated suggestions for process optimization.
An important step towards the application of process mining and machine learning is a qualitatively and quantitatively sufficient basis of training data. Both in the area of processing time series, cross-sectional data or text and image processing, as well as in the field of sequential data on processes, it is therefore important to research methods for the highly granular, diverse and, above all, error- and gap-free extraction or augmentation of data sets. The Data-Centric AI competence pillar is primarily dedicated to this research area. With regard to available process event logs, a great need for retrofitting or research is evident, especially in less digitized companies. For production processes, for example, it is conceivable to support the digitization of processes and thus data acquisition through cyber-physical systems (CPS) and simulation. In this way, processes can be analyzed on a data basis using the extended event log.
The competence pillar “Process-Aware Learning" is an integral part of the "Process Intelligence" group
The competence pillar “Process-Aware Learning" is an integral part of the "Process Intelligence" group of the Fraunhofer IIS Supply Chain Services workgroup. The group's researchers are dedicated to two main areas, the data-driven investigation of business processes and machine learning models for forecasting and monitoring processes.