Process Discovery is the branch of Process Mining that focus on extracting process models – mainly business process models – from an event log.
The Process Discovery Contest (PDC) is dedicated to advance the development of new discovery techniques and the performance of existing techniques. It also tries to stimulate the community to consider new types of event data which remain not sufficiently explored. This is the third edition of the contest that follows the success obtained during the editions of 2016 and 2017. Since this edition, the contest has become part of the International Conference on Process Mining.
The PDC’s objective is to compare the efficiency of techniques to discover process models that provide:
- a proper balance between “overfitting” and “underfitting”. A process model is overfitting (the event log) if it is too restrictive, disallowing behavior which is part of the underlying process. This typically occurs when the model only allows for the behavior recorded in the event log. Conversely, it is underfitting (the reality) if it is not restrictive enough, allowing behavior which is not part of the underlying process. This typically occurs if it overgeneralizes the example behavior in the event log.
- a business value for process owners. The model needs to certainly be fitting and precise but also must provide insights into how processes are executed.
The contest is open to any discovery algorithms, combination of different algorithms and more elaborated procedures, which can possibly involve some human support (similarly to semi-supervised techniques). The contest is independent of the notation in which the discovered models are expressed: No preference is made. Any procedural (e.g., Petri Net or BPMN) or declarative (e.g., Declare) notation are equally welcome.
Neither is the context restricted to open-source tools. Proprietary tools can also participate.
To the contest, single individuals or groups can participate.
Organization of the Contest
Ten random business process models are constructed with varying behavioral characteristics (loops, OR-splits, …). Five of these models contain decision points that are associated with decision tables, expressed in the DMN notation using the FEEL language. Note that the decision-point analysis is a novelty with respect to previous versions of the contest.
The characteristics present in each of the ten models are described in the document available here. <Updated on 08/04/2019 at 18:55>
For each of the ten processes, ten “training” event logs are provided. For each train log, 80% of the traces represent complaint executions of the process, and 20% are incomplete traces. Incomplete traces are traces for which a number of events are removed from the end of the trace, so as to simulate that these traces were not yet completed when the data was extracted.
As a participant, you have to submit one model for each of the ten processes.
The process models are kept secret: only training logs that show a portion of the possible behavior are disclosed. Contestants can use any discovery algorithms, can combine them through more elaborate procedure and extend with decision-point discovery techniques. Ultimately, the goal is to discover ten models, one for each process, on the basis of the respective training log.Any modelling process notation is accepted, as long as the sematics is not ambiguous. The authors can use any existing notation with well-known semantics, but so can they use “less standard” notation. In the latter case, authors must provide a clear description of the semantics.
The evaluate the quality of the discovered models, two metrics are used:
- Accuracy. A classification approach are used. A test even log are provided for each training log. The training logs contain both traces that do and do not belong to the underlying process. The accuracy are measured as the sum of the following:
- the number of traces that represent real process behavior and are classified as allowed by the discovered model
- the number of traces that represent a behavior not related to the process and are classified as disallowed by the discovered model
that is later divided by the number of traces of the test log.
- Business Value: A set of questions about the underlying process model are given to a jury together with the corresponding discovered models. The percentage of questions that that the jury can answer correctly within a given time with respect to the underlying process based on the model represents the business value. The jury consists of both practitioners and members of academia.
The evaluation procedure to determine the wineer is as follows:
-
Submissions of the contestant(s) are first ranked in terms of accuracy. All submissions within a 5% range of the contestant(s) with the best average accuracy move to the next step.
- The models of the submissions that are not excluded are ranked by the jury for their business value. The winner is the group (or single individual) that is related to the submission that has the highest average position in the ranking. The ranking is computed by ordering the submissions by the number of questions that are correctly answered by the jury (see the paragraph Business Value above) for all models. If two or more contestant groups (and/or single individuals) have the same ranking position, the winner is that with the highest accuracy. Further ties are broken by considering the lowest variance in accuracy.
The members of the jury are determined after the submission deadline to ensure that no participant is also member.
Where and How to submit
Submissions need to be not later than 15 April 2019 via Easychair at https://easychair.org/conferences/?conf=icpm2019, selecting the track related to the Process Discovery Contest. Submissions on Easychair consists of:
- The title of the submission
- An brief abstract (i.e. no more than 200-250 words) that discusses the (combination of) techniques employed and the rationale of the choice.
- A zip file that includes the following:
- A document that at least contains the following sections:
- One section that discusses the replaying semantics of the process modelling notation that has been employed. In other words, the section needs to discuss how, given any process trace
t and any process model m in that notation, it can be unambiguously determined whether or not trace
t can be replayed on model m. As an alternative to this section, the contestant can provide a link to a paper or any other document where the replaying semantics is described. - One section that provides a link where one can download the tool(s) used to discover the process models as well as the step-by-step guide to generate one of the process models. In case the tool is not open-source, a license needs to be provided, which needs be valid at least until 30 May 2019. The licenses are only used by the organizers to evaluate the submission.
- One section that discusses the replaying semantics of the process modelling notation that has been employed. In other words, the section needs to discuss how, given any process trace
- the 10 process-model files, one for each of the 10 processes. In particular, many established notations have well-defined formats to store models, such as the PNML format for Petri nets or BPMN format for BPMN models. For well-defined notations, participants are expected to provide the process-model files in one of the standard notations.
- A document that at least contains the following sections:
On 16 April 1 May 2019, the “test” event logs are made available on this web site and their availability is notified to the contestants that submitted. By the end of April 2019, contestants are asked to provide a classification of the traces of the event logs. For well-defined notations, the organizer can support during this second phase through automated tools.
Key Dates
-
13 January 2019: Opening to submissions. The training event logs are made available.
- 15 30 April 2019: Deadline for submissions.
- 16 April 1 May 2019 : Publications of the “test” event logs and Notification of requests to provide classification (see Section “Where and How to Submit” above).
- 30 April 15 May 2019: Submissions of the classifications for “test” event logs.
-
15 30 May 2019: Conclusion of the assessment by the jury members of the top-notch submissions (i.e. those not excluded at the point 2 of the list in section “Organization of the contest”).
- 15 31 May 2019: Notification to the groups (of winning or not winning).
-
23-28 June 2019: Announcement of the winner at ICPM and awards.
On 15 February, 2019 and on 15 March, 2019, ten intermediate “test” event logs are published on this web site (10 in February and 10 in March). Each of these event logs are characterized by having 20 traces that can be replayed and 20 traces that cannot on the respective process model. However, no information are given about which of the traces can or cannot be replayed.
The “intermediate” test event logs of February, 15th are available here.
The “intermediate” test event logs of March, 15th are available here. <Updated on 05/04/2019 at 16:55>
The contestants can submit their classification attempt to the organizers at most two times via email to process.discovery.contest@gmail.com. The classification attempts need to be submitted via Email using the Excel file available here. The contestants should write TRUE or FALSE in the table, depending whether the respective model classifies the classifies the corresponding trace as fitting or not fitting. As an example, type TRUE in the cell at row trace_3 and column model_5 if the model of the 5th classifies the trace “3” of the respective test log as fitting. Otherwise, type FALSE. In case of questions related to the use of the Excel files or to the “test” logs, please contact the organizers by email.
The organizers reply stating how many traces have been correctly classified. The two feedback loops can be used as a mean to assess the effectiveness of the discovery algorithms. Two attempts are possible per contestant for each set of intermediate “test” event logs. The possibility to submit the classification attempts are possible at any moment until 30 April 2019.
Prizes and Journal Invitation
The results and the winner were announced in Aachen during the International Conference on Process Mining, which received a plaque and a monetary prize.
The results for the winner and the runner-up are as follows, where TP, FP, TN and FN indicated the traces that are classified as true/false positive/negative by the respective participant:
Authors | Submission | TP | FP | TN | FN | Accuracy |
---|---|---|---|---|---|---|
Eric Verbeek | Discovery using the Log Skeleton Filter and Browser | 452 | 1 | 446 | 1 | 0.997 |
Christoffer Olling Back, Thomas T. Hildebrandt, Santosh Kumar, Viktorija Nekrasaite, Andrew Tristan Parli, and Tijs Slaats | Process Discovery with Dynamic Condition Response Graphs | 448 | 30 | 417 | 10 | 0.961 |
The jury’s judgement on the usability of the obtained models confirmed the ranking.
Recall that the classification results were based on ten event logs, each of which contain 90 traces. Except for the log related to model four, the other nice event logs had 45 positive and 45 negative traces. For event log four, 48 traces were positive, and 42 were negative.
Event Logs and Other Documents
-
-
- The ten training logs are available here.
- The behaviour characterstics present in the ten process are described in the document available here. <Updated on 08/04/2019 at 18:55>
- The intermediate “test” event logs of February, 15th are available here.
- The intermediate “test” event logs of March, 15th are available here. <Updated on 05/04/2019 at 16:55>
- The classification attempts on the intermediate “test” event logs should be submitted using the template available here.
-
Positioning of the Process Discovery Contest
The only other contest related to process mining is the annual Business Processing Intelligence Challenge (BPIC), which is also co-located with the International Conference on Process Mining. The BPIC uses real-life data without objective evaluation criteria: It is about the perceived value of the analysis and is not limited to the discovery task (also conformance checking, performance analysis, etc.). The report is evaluated by a jury. The Process Discovery Contest is different. The focus is on process discovery. Synthetic data are used to have an objectified “proper” answer. Process discovery is turned into a classification task with a training set and a test set. A process model needs to decide whether traces are fitting or not.
Organizers
-
Josep Carmona, Universitat Politècnica de Catalunya (UPC), Spain
-
Massimiliano de Leoni, University of Padua, Italy
-
Benoît Depaire, Hasselt University, Belgium.