Model Learning | CritLab at Queen's

In recent years, machine learning has been integrated into more and more areas of life. However, the safety of such systems often cannot be verified due to their complexity and unknown internal structure. For such black-box systems, model learning can provide additional information. Model learning typically deduces an executable representation either by monitoring the System Under Learning (SUL) (passive learning), or by prompting the SUL (active learning). Either approach produces a model consistent with the observations. These models can be used for verification methods like model checking, but often simply obtaining a graphical illustration of the internal workings of the system can provide an increase in confidence that it works as intended. The approach is especially useful for artificial intelligence (AI) systems, where a function is constructed from training data and no human-readable explanation might exist.

Active algorithms like Angluin’s L* have shown promising results. However, they can be difficult to apply in practice, as many systems exist that provide no way for a learner to interact with them. An example of such a system could be the controller of a smart traffic light, where the inputs are the arrival of cars in a street lane. Luckily, in the modern era of big data, many systems are monitored throughout their deployment in the real world, producing log files that can span over months or years. Passive learning algorithms like (timed) k-tails take large numbers of such traces, convert them into a tree-like structure, and apply state-merging techniques to collapse them into cyclic automata.

At CritLab, we develop methods to passively learn explainable timed models from realistic system logs (Dierl et al., 2023). There are several challenges to doing this effectively. First, the choice of formalism for the model must be expressive enough to capture the behavior of the system, otherwise the number of states will explode and the model will cease to make sense to humans. However, more expressive formalisms increase the learning cost. Second, the log data must be transformed into features that map to the learning system. This process injects prior knowledge into model learning and doing so automatically requires carefully designed heuristics (Kauffman & Fischmeister, 2017). We are working on methods to automatically transform heterogenous multi-variate logs into explainable timed models without hand-tuning. This work relies on accurate metrics for what makes one model representation better than another, which are not obvious to define.

References

2023

Learning Symbolic Timed Models from Concrete Timed Data

Simon Dierl , Falk Maria Howar , Sean Kauffman , Martin Kristjansen , Kim Guldstrand Larsen , Florian Lorber , and Malte Mauritz

In NASA Formal Methods (NFM’23), Jun 2023

Abs DOI

We present a technique for learning explainable timed automata from passive observations of a black-box function, such as an artificial intelligence system. Our method accepts a single, long, timed word with mixed input and output actions and learns a Mealy machine with one timer. The primary advantage of our approach is that it constructs a symbolic observation tree from a concrete timed word. This symbolic tree is then transformed into a human comprehensible automaton. We provide a prototype implementation and evaluate it by learning the controllers of two systems: a brick-sorter conveyor belt trained with reinforcement learning and a real-world derived smart traffic light controller. We compare different model generators using our symbolic observation tree as their input and achieve the best results using k-tails. In our experiments, we learn smaller and simpler automata than existing passive timed learners while maintaining accuracy.

2017

Mining Temporal Intervals from Real-time System Traces

Sean Kauffman and Sebastian Fischmeister

In International Workshop on Software Mining (SoftwareMining’17), Jun 2017

Abs DOI

We introduce a novel algorithm for mining temporal intervals from real-time system traces with linear complexity using passive, black-box learning. Our interest is in mining nfer specifications from spacecraft telemetry to improve human and machine comprehension. Nfer is a recently proposed formalism for inferring event stream abstractions with a rule notation based on Allen Logic. The problem of mining Allen’s relations from a multivariate interval series is well studied, but little attention has been paid to generating such a series from symbolic time sequences such as system traces. We propose a method to automatically generate an interval series from real-time system traces so that they may be used as inputs to existing algorithms to mine nfer rules. Our algorithm has linear runtime and constant space complexity in the length of the trace and can mine infrequent intervals of arbitrary length from incomplete traces. The paper includes results from case studies using logs from the Curiosity rover on Mars and two other realistic datasets.