Process Discovery Contest

Process Discovery is the branch of Process Mining that focus on extracting process models – mainly business process models – from an event log.

The Process Discovery Contest (PDC) is dedicated to advance the development of new discovery techniques and the performance of existing techniques. It also tries to stimulate the community to consider new types of event data which remain not sufficiently explored. This is the third edition of the contest that follows the success obtained during the editions of 2016 and 2017. Since this edition, the contest has become part of the International Conference on Process Mining.

The PDC’s objective is to compare the efficiency of techniques to discover process models that provide:

  • a proper balance between “overfitting” and “underfitting”. A process model is overfitting (the event log) if it is too restrictive, disallowing behavior which is part of the underlying process. This typically occurs when the model only allows for the behavior recorded in the event log. Conversely, it is underfitting (the reality) if it is not restrictive enough, allowing behavior which is not part of the underlying process. This typically occurs if it overgeneralizes the example behavior in the event log.
  • a business value for process owners. The model needs to certainly be fitting and precise but also must provide insights into how processes are executed.

The contest is open to any discovery algorithms, combination of different algorithms and more elaborated procedures, which can possibly involve some human support (similarly to semi-supervised techniques). The contest is independent of the notation in which the discovered models are expressed: No preference is made. Any procedural (e.g., Petri Net or BPMN) or declarative (e.g., Declare) notation are equally welcome.

Neither is the context restricted to open-source tools. Proprietary tools can also participate.

To the contest, single individuals or groups can participate.

Organization of the Contest

Ten random business process models will be constructed with varying behavioral characteristics (loops, OR-splits, …) and level of complexity. Five of these models will contain decision points that are associated with decision tables, expressed in the DMN notation using the FEEL language. Note that the decision-point analysis is a novelty with respect to previous versions of the contest.

For each of the ten processes, ten ‘training’ noise-free but incomplete event logs will be provided: every event log will contain 20% incomplete traces. Incomplete traces are traces for which a number of events are removed from the end of the trace (to simulate that these traces were not yet completed when the data was queried).

As a participant, you have to submit 1 process model for each event log which is evaluated against the underlying process used to generate the event data.

The process models will be kept secret: only event logs showing a portion of the possible behavior will be disclosed. Contestants can use any discovery algorithms, can combine them through more elaborate procedure and extend with decision-point discovery techniques. Ultimately, the goal is to discover ten models, one for each of ten “training” event logs. The semantics of the modelling notation used must be clear. Either the authors use an existing notation with well-known semantics or they must provide a clear description of the semantics.

The evaluate the quality of the discovered models, two metrics will be used

  • Accuracy. A classification approach will be used: A test log will be provided for each training log which contains both traces that do and do not belong to the underlying process. The accuracy will be measured as the sum of the following:
    • the number of traces that represent real process behavior and are classified as allowed by the discovered model
    • the number of traces that represent a behavior not related to the process and are classified as disallowed by the discovered model

    that is later divided by the number of traces of the test log.

  • Business Value: A set of questions about the underlying process model are given to a jury together with the corresponding discovered models. The percentage of questions that that the jury can answer correctly within a given time with respect to the underlying process based on the model represents the business value. The jury consists of both practitioners and members of academia.

The evaluation procedure is as follows:

  1. Submissions of the contestant(s) are first ranked in terms of accuracy. All submissions within a 5% range of the contestant(s) with the best average accuracy will move to the next step.
  2. The models of the submissions that are not excluded will be ranked by the jury for their business value. The winner is(are) the contestant(s) related to the submission that has the highest average position in the ranking (i.e. for all models in consideration). If two or more contestant group(s) have the same ranking position, the winner is that with the highest accuracy. Further ties will be broken by considering the lowest variance in accuracy and, later, the lowest variance in ranking.

The members of the jury will be determined after the submission deadline to ensure that no participant is also member.

Where and How to submit

Submissions need to be not later than 15 April 2019 via Easychair at https://easychair.org/conferences/?conf=icpm2019, selecting the track related to the Process Discovery Contest. Submissions on Easychair consists of:

  1. The title of the submission
  2. An brief abstract (i.e. no more than 200-250 words) that discusses the (combination of) techniques employed and the rationale of the choice.
  3. A zip file that includes the following:
    • A document that at least contains the following sections:
      1. One section that discusses the replaying semantics of the process modelling notation that has been employed. In other words, the section needs to discuss how, given any process trace
        t and any process model m in that notation, it can be unambiguously determined whether or not trace
        t can be replayed on model m. As an alternative to this section, the contestant can provide a link to a paper or any other document where the replaying semantics is described.
      2. One section that provides a link where one can download the tool(s) used to discover the process models as well as the step-by-step guide to generate one of the process models. In case the tool is not open-source, a license needs to be provided, which needs be valid at least until 30 May 2019. The license will only be used by the organizers to evaluate the submission.
    • the 10 process-model files, one for each of the 10 processes. In particular, many established notations have well-defined formats to store models, such as the PNML format for Petri nets or BPMN format for BPMN models. For well-defined notations, participants are expected to provide the process-model files in one of the standard notations.

On 16 April 2019, the “test” event logs will be made available on this event log and their availability will be notified to the contestants that submitted. By the end of April 2019, contestants will be asked to provide a classification of the traces of the event logs. For well-defined notations, the organizer can support during this second phase through automated tools.

Key Dates

  • 3 January 2019: Opening to submissions. The training event logs will be made available.
  • 15 April 2019: Deadline for submissions.
  • 16 April 2019: Publications of the “test” event logs and Notification of requests to provide classification (see Section “Where and How to Submit” above).
  • 30 April 2019: Submissions of the classifications for “test” event logs.
  • 1-15 May 2019: Assessment of the jury members of the top-notch submissions (i.e. those not excluded at the point 2 of the list in section “Organization of the cost”).
  • 15 May 2019: Notification of the winner.
  • 23-28 June 2019: Announcement of the winner and awards.

On 1 February, 2019 and on 1 March, 2019, ten intermediate “test” event logs will be published on this web site (10 in February and 10 in March). Each of these event logs will be characterized by having 20 traces that can be replayed and 20 traces that cannot on the respective process model. However, no information will be given about which of the traces can or cannot be replayed. The contestants can submit their classification attempt to the organizers at most two times via email to discoverycontest@tue.nl (further details on the submission’s format will follow in the due time). The organizers will reply stating how many traces have been correctly classified. The two feedback loops can be used as a mean to assess the effectiveness of the discovery algorithms. Two attempts are possible per contestant for each set of intermediate “test” event logs.

Prizes and Journal Invitation

The results and the winner will be announced in Aachen during the International Conference on Process Mining. The winner group will be given a chance to present the approach/technique during the conference. Furthermore, the group will be also awarded with a plaque.

To ensure the presence of at least one person from the winning team, one member of the winning team will be offered :

  • the lodging and travelling expenses to reach Aachen
  • the full registration for the ICPM conference

Each top-notch submission (including the winner) will be invited to submit an article for International Journal on Software Tools for Technology Transfer (STTT), published by Springer. This applies to all top-notch submissions that refer to sufficiently novel techniques that have not been already accepted elsewhere or being under review. The organizer will determine whether or not one submission falls into this category.

Examples of Submissions

To gain further insights into typical submissions, one can refer to some reports:

Note that writing this sort of report is not mandatory. Authors of the above reports decided to prepare them after the notification in order to further disseminate their techniques and results.

Positioning of the Process Discovery Contest

The only other contest related to process mining is the annual Business Processing Intelligence Challenge (BPIC), which is also co-located with the International Conference on Process Mining. The BPIC uses real-life data without objective evaluation criteria: It is about the perceived value of the analysis and is not limited to the discovery task (also conformance checking, performance analysis, etc.). The report is evaluated by a jury. The Process Discovery Contest is different. The focus is on process discovery. Synthetic data are used to have an objectified “proper” answer. Process discovery is turned into a classification task with a training set and a test set. A process model needs to decide whether traces are fitting or not.

Organizers

  • Josep Carmona, Universitat Politècnica de Catalunya (UPC), Spain
  • Massimiliano de Leoni, Eindhoven University of Technology, The Netherlands
  • Benoît Depaire, Hasselt University, Belgium.