Sampling What Matters: Relevance-guided Sampling of Event Logs

Martin Kabierski, Hoang Lam Nguyen, Lars Grunske and Matthias Weidlich


The comparison of a model of a process against event data recorded during its execution, known as conformance checking, is an important means in process analysis. Yet, common conformance checking techniques are computationally expensive, which makes a complete analysis infeasible for large logs. To mitigate this problem, existing techniques leverage data samples. Then, the result quality depends on the relevance of the sample for a specific analysis task. Existing sampling strategies therefore rely on a static assumption on what constitutes relevant event data, which is generally unknown a priori.

In this paper, we present relevance-guided sampling of event logs. Instead of employing a fixed relevance hypothesis, our approach learns the characteristics of event data that determine its relevance for conformance checking. To this end, we first explore the correlations between characteristics of the event data and the goal of a conformance checking task, before exploiting these correlations to guide the selection of a data sample. We present different instantiations of this approach and demonstrate that they significantly improve the quality of samples, and hence of conformance checking results, compared to baseline strategies.