Model Learning

mining explainable timed models from offline system logs

In recent years, machine learning has been integrated into more and more areas of life. However, the safety of such systems often cannot be verified due to their complexity and unknown internal structure. For such black-box systems, model learning can provide additional information. Model learning typically deduces an executable representation either by monitoring the System Under Learning (SUL) (passive learning), or by prompting the SUL (active learning). Either approach produces a model consistent with the observations. These models can be used for verification methods like model checking, but often simply obtaining a graphical illustration of the internal workings of the system can provide an increase in confidence that it works as intended. The approach is especially useful for artificial intelligence (AI) systems, where a function is constructed from training data and no human-readable explanation might exist.

Active algorithms like Angluin’s L* have shown promising results. However, they can be difficult to apply in practice, as many systems exist that provide no way for a learner to interact with them. An example of such a system could be the controller of a smart traffic light, where the inputs are the arrival of cars in a street lane. Luckily, in the modern era of big data, many systems are monitored throughout their deployment in the real world, producing log files that can span over months or years. Passive learning algorithms like (timed) k-tails take large numbers of such traces, convert them into a tree-like structure, and apply state-merging techniques to collapse them into cyclic automata.

At CritLab, we develop methods to passively learn explainable timed models from realistic system logs (Dierl et al., 2023). There are several challenges to doing this effectively. First, the choice of formalism for the model must be expressive enough to capture the behavior of the system, otherwise the number of states will explode and the model will cease to make sense to humans. However, more expressive formalisms increase the learning cost. Second, the log data must be transformed into features that map to the learning system. This process injects prior knowledge into model learning and doing so automatically requires carefully designed heuristics (Kauffman & Fischmeister, 2017). We are working on methods to automatically transform heterogenous multi-variate logs into explainable timed models without hand-tuning. This work relies on accurate metrics for what makes one model representation better than another, which are not obvious to define.

References

2023

  1. Learning Symbolic Timed Models from Concrete Timed Data
    Simon Dierl , Falk Maria Howar , Sean Kauffman , Martin Kristjansen , Kim Guldstrand Larsen , Florian Lorber , and Malte Mauritz
    In NASA Formal Methods (NFM’23), Jun 2023

2017

  1. Mining Temporal Intervals from Real-time System Traces
    Sean Kauffman and Sebastian Fischmeister
    In International Workshop on Software Mining (SoftwareMining’17), Jun 2017