Process Discovery Contest

Process Discovery Contest

PDC2023 is the 2023 edition of the Process Discovery Contest.

There can be only one…

Do you believe you have implemented a good discovery algorithm?
Then submit it to the PDC 2023 to put it to the test!

The PDC 2023 test contains 384 event logs for which a model needs to be discovered (training logs), and 96 pairs of event logs that are used to evaluate the discovered models (test logs and base logs).

The 96 pairs of test logs and base logs are all generated using the same configurable model, which has the following configurable options:

  • Long-term dependencies: yes/no
  • Loops: no/simple/complex
    • A simple loop has a single point of entry and a single point of exit.
    • A complex loop has multiple points of entry and/or multiple points of exit.
  • OR constructs: yes/no
  • Routing constructs: yes/no
  • Optional tasks: yes/no
  • Duplicate tasks: yes/no

Each pair is matched by four training logs:

  1. A training log without noise.
  2. A training log where in every trace with probability 20% either one random event is removed (40%), moved (20%), or copied (40%).
  3. Training log 0 with 20 fitting traces classified as positive (boolean pdc:isPos attribute set to true).
  4. Training log 1 with 20 fitting traces classified as positive (boolean pdc:isPos attribute set to true) and 20 non-fitting traces classified as negative (boolean pdc:isPos attribute set to false).

Each training log contains 1000 traces that result from random walks through the configured model. A discovery algorithm may assume that a trace in a training log is fitting if the boolean pdc:isPos attribute is set to true, and it may assume that it is non-fitting if the boolean pdc:isPos attribute is set to false.

Each test log and each base log contains 1000 traces. For every pair of a test log and a base log we determine for every trace from the test log whether it fits the discovered model better than the corresponding trace from the base log. 500 traces from the test log fit the original model better than the corresponding trace from the base log, and 500 do not. For sake of completeness: Two traces from both logs correspond if and only if they are at the same position in the corresponding logs: The fifth trace of the test log is classified against the fifth trace of the base log, etc.

How, what and when to submit?

Please let Eric Verbeek know that you want to submit your implemented discovery algorithm. Eric will then provide you with a link where you can upload your submission.

You should submit a working discovery algorithm, which can be called using a Discover.bat Windows batch file which takes two parameters:

  1. The full path to the training log file, excluding the .xes extension.
  2. The full path to the model file where the discovered model should be stored, excluding any extension like .pnml or .bpmn.

As an example, assuming that the miner discovers a Petri net,

Discover.bat logs\discovery\discovery-log models\discovered-model

will discover a model from the training log file logs\discovery\discovery-log.xes and will export the discovered Petri net to the file models\discovered-model.pnml.

If the results of calling your Discovery.bat file as described above is a PNML file (Petri nets) or a BPMN file (BPMN diagram), then you’re done. If not, the discovery algorithm needs to come with its own working classifier algorithm, that is, a Classify.bat Windows batch file, which takes four parameters:

  1. The full path to the test log file, excluding the .xes extension.
  2. The full path to the base log file, excluding the .xes extension.
  3. The full path to the model file which should be used to classify the test log, excluding any extension like .pnml or .bpmn.
  4. The full path to the log file where the classified test log should be stored, excluding the .xes extension.

As an example, assuming that the miner discovered a Petri net,

Classify.bat logs\test\test-log logs\base\base-log models\discovered-model logs\classified\test-log

will classify whether every trace from the test log logs\test\test-log.xes fits the Petri net from models\discovered-model.pnml better than the corresponding trace from the base log logs\base\base-log.xes and it will export the classified traces (by setting the pdc:isPos attribute for every trace) as an event log in logs\classified\test-log.xes.

Classification of a trace is done by adding the boolean pdc:isPos attribute to the trace, which should be:

  • true if the trace is classified positive (fits your model better than the corresponding trace in the base model) and
  • false if the trace is classified negative (does not fit your model better than the corresponding trace in the base model).

Score and winners

For each training log, the Discovery.bat file is used to discover a model from the training log. Next, the Classify.bat is used to classify every trace in the test log against the corresponding trace in the base log using the discovered model. This results in a positive accuracy rate P and a negative accuracy rate N for this training log. From these, its F-score F is computed as 2*(P*N)/(P+N). The end score for the discovery algorithm is the average F-score over all 384 training logs.

The winner is the submission with the best end score.

Key Dates

Submission deadlineFriday, 29 September 2023
Disclosure of the data setSaturday, 30 September 2023
Winner notificationFriday, 6 October 2023
Winner announcementDuring ICPM 2023

Example discovery algorithms

We have run the three base miners on the data set: The Flower miner, the Directly Follows miner, and the Trace (or Sequence) miner. The following table shows the results for these miners when using either all traces, only the non-negative traces, or only the positive traces to discover a model.

AllNon-negativePositive
Directly Follows miner59%59%64%
Flower miner0%0%1%
Trace miner74%74%68%

The 1% score for the Flower miner on only the positive traces is explained by the fact that sometimes the 20 positive traces do not contain all possible activities. As a result, the Flower model discovered then will not contain some activities, and any trace that contains a non-contained activity will be reported as non-fitting.

Submissions

The following seventeen algorithms were submitted:

  1. 1.a.DisCoveRJS-SimpleClassification, by Axel Christfort and Tijs Slaats
  2. 2.a.DisCoveRJS-Filtering-SimpleClassification, by Axel Christfort and Tijs Slaats
  3. UiPath_PIM_PDC2022_configI, by Dennis Brons
  4. UiPath_PIM_PDC2023_configA, by Dennis Brons
  5. UiPath_PIM_PDC2023_configB, by Dennis Brons
  6. UiPath_PIM_PDC2023_configC, by Dennis Brons
  7. 2.a.DisCoveRJS-Filtering-SimpleClassification-22, by Axel Christfort and Tijs Slaats
  8. UiPath_PIM_PDC2023_configD, by Dennis Brons
  9. 2.a.DisCoveRJS-Filtering-SimpleClassification-18, by Axel Christfort and Tijs Slaats
  10. 2.a.DisCoveRJS-Filtering-SimpleClassification-17, by Axel Christfort and Tijs Slaats
  11. 2.a.DisCoveRJS-Filtering-SimpleClassification-19, by Axel Christfort and Tijs Slaats
  12. UiPath_PIM_PDC2023_configE, by Dennis Brons
  13. AIM-Replay, by Niklas van Detten and Sander Leemans
  14. POWLMiner_v3_filteringDynamic, by Humam Kourani, Daniel Schuster and Wil van der Aalst
  15. POWLMiner_v3_filtering05, by Humam Kourani, Daniel Schuster and Wil van der Aalst
  16. POWLMiner_v3_filtering01, by Humam Kourani, Daniel Schuster and Wil van der Aalst
  17. POWLMiner_v3_V2filteringDynamic, by Humam Kourani, Daniel Schuster and Wil van der Aalst

Results

SubmissionScorePositive accuracyNegative accuracy
2.a.DisCoveRJS-Filtering-SimpleClassification-1884.0%78.2%91.6%
2.a.DisCoveRJS-Filtering-SimpleClassification-1784.0%78.0%92.0%
2.a.DisCoveRJS-Filtering-SimpleClassification-1984.0%78.4%91.2%
2.a.DisCoveRJS-Filtering-SimpleClassification83.7%77.3%92.7%
2.a.DisCoveRJS-Filtering-SimpleClassification-2282.7%77.5%89.3%
UiPath_PIM_PDC2023_v1_configE78.4%74.2%84.5%
UiPath_PIM_PDC2023_v1_configD78.2%73.9%84.6%
UiPath_PIM_PDC2022_configI77.1%72.7%84.1%
UiPath_PIM_PDC2023_v1_configC76.6%73.4%81.2%
UiPath_PIM_PDC2023_configB74.4%71.6%77.7%
1.a.DisCoveRJS-SimpleClassification73.4%64.7%95.0%
UiPath_PIM_PDC2023_configA68.3%65.4%71.7%
AIM-Replay64.0%58.1%73.4%
POWLMiner_v3_V2filteringDynamic49.2%42.4%97.0%
POWLMiner_v3_filteringDynamic47.0%42.4%56.2%
POWLMiner_v3_filtering0146.2%37.6%96.1%
POWLMiner_v3_filtering0544.1%40.1%51.5%

The results for the POWLMiner_v3_V2filteringDynamic submission have been estimated, as it did not finish on time for the logs that contain noise. For the noise-free logs only, this submission scores 81.9%, but on the logs that do contain noise the score so far was only 16.5%.

The positive accuracy is the percentage of positive traces correctly classified as positive, and the negative accuracy is the percentage of negative trace correctly classified as negative.

Winner

The best overall submission is the 2.a.DisCoveRJS-Filtering-SimpleClassification-18, by Axel Christfort and Tijs Slaats, which scores 84.0%. Congratulations to Axel and Tijs!

Discovery Contest Organizer

Platinum sponsors

Gold sponsors

Silver sponsors

Bronze sponsors

Sponsors and exhibition