BPI Challenge 2019

The ninth International Business Process Intelligence Challenge is co-located with ICPM this year. This challenge provides participants with a real-life event log, and challenges them to analyze these data using whatever techniques available, focusing on one or more of the process owner’s questions or proving other unique insights into the process(es) captured in the event log.

We strongly encourage people to use any tools, techniques, methods at their disposal. There is no need to restrict to open-source tools, and proprietary tools as well as techniques developed or implemented specifically for this challenge are welcome.

Important dates

Publication of the data:January 28, 2019
Abstract submission deadline:18 May 2019 (No extension possible)
Report submission deadline:25 May 2019 (No extension possible)
Presentation of the winners:At the ICPM Conference dinner, Aachen, Germany
Conference dates:24 - 26 June 2019, Aachen, Germany

Note that the submission deadlines are strict to allow sufficient time for the Jury to read the reports and assess them. It is advised that all participants keep the conference dates free in their agenda’s to be able to accept an invitation to join the conference in case you win the challenge.

The Process

For the BPI Challenge 2019, we collected data from a large multinational company operating from The Netherlands in the area of coatings and paints and we ask participants to investigate the purchase order handling process for some of its 60 subsidiaries. In particular, the process owner has compliance questions.

In the data, each purchase order (or purchase document) contains one or more line items. For each line item, there are roughly four types of flows in the data:

  1. 3-way matching, invoice after goods receipt.
    For these items, the value of the goods receipt message should be matched against the value of an invoice receipt message and the value put during creation of the item (indicated by both the GR-based flag and the Goods Receipt flags set to true).
  2. 3-way matching, invoice before goods receipt.
    Purchase Items that do require a goods receipt message, while they do not require GR-based invoicing (indicated by the GR-based IV flag set to false and the Goods Receipt flags set to true). For such purchase items, invoices can be entered before the goods are receipt, but they are blocked until goods are received. This unblocking can be done by a user, or by a batch process at regular intervals. Invoices should only be cleared if goods are received and the value matches with the invoice and the value at creation of the item.
  3. 2-way matching (no goods receipt needed)
    For these items, the value of the invoice should match the value at creation (in full or partially until PO value is consumed), but there is no separate goods receipt message required (indicated by both the GR-based flag and the Goods Receipt flags set to false).
  4. Consignment
    For these items, there are no invoices on PO level as this is handled fully in a separate process. Here we see GR indicator is set to true but the GR IV flag is set to false and also we know by item type (consignment) that we do not expect an invoice against this item.

Unfortunately, the complexity of the data goes further than just this division in four categories. For each purchase item, there can be many goods receipt messages and corresponding invoices which are subsequently paid. Consider for example the process of paying rent. There is a Purchase Document with one item for paying rent, but a total of 12 goods receipt messages with (cleared) invoices with a value equal to 1/12 of the total amount. For logistical services, there may even be hundreds of goods receipt messages for one line item.

Overall, for each line item, the amounts of the line item, the goods receipt messages (if applicable) and the invoices have to match for the process to be compliant.

The Data

We collected over 1,5 million events for purchase orders submitted in 2018. The data shows the purchase to pay process (without the approval workflow of the PO’s and the invoices). The data refers to many different categories of goods and services and include many different types of vendors.

Of course, the log is anonymized, but some semantics are left in the data, for example:

  • The resources are split between fully anonymized batch users and normal users. The batch users are automated processes executed by different systems. The normal users refer to human actors in the process.
  • The values of each event are fully anonymized from the original data using a linear translation respecting 0, i.e. addition of multiple invoices for a single item should still lead to the original item worth (although there may be small rounding errors for numerical reasons).
  • Company, vendor, system and document names and IDs are again fully anonymized in a consistent way throughout the log. The company has the anonymization key, so any result can be translated by them to business insights about real customers and real purchase documents.

We purposely did not anonymize the (consecutive) item IDs, the descriptions of the document type, the item type or the various text fields detailing the type of spending. This allows the participants to the challenge to get the most out of the data and to understand the context of the purchase documents and purchase items.

The event log is fully IEEE-XES compliant and is structured as follows. The case ID is a combination of the purchase document and the purchase item. There is a total of 76,349 purchase documents containing in total 251,734 items, i.e. there are 251,734 cases. In these cases, there are 1,595,923 events relating to 42 activities performed by 627 users (607 human users and 20 batch users). Sometimes the user field is empty, or NONE, which indicates no user was recorded in the source system.

For each purchase item (or case) the following attributes are recorded:

  • concept:name: A combination of the anonymized purchase document id and the anonymized item id,
  • Purchasing Document: The anonymized purchasing document ID,
  • Item: The anonymized item ID,
  • Item Type: The type of the item,
  • GR-Based Inv. Verif.: Flag indicating if GR-based invoicing is required (see above),
  • Goods Receipt: Flag indicating if 3-way matching is required (see above),
  • Source: The anonymized source system of this item,
  • Doc. Category name: The name of the category of the purchasing document,
  • Company: The anonymized subsidiary of the company from where the purchase originated,
  • Spend classification text: A text explaining the class of purchase item,
  • Spend area text: A text explaining the area for the purchase item,
  • Sub spend area text: Another text explaining the area for the purchase item,
  • Vendor: The anonymized vendor to which the purchase document was sent,
  • Name: The anonymized name of the vendor,
  • Document Type: The document type,
  • Item Category: The category as explained above (3-way with GR-based invoicing, 3-way without, 2-way, consignment).

The data can be downloaded from the 4TU Center for Research Data (694 MB), or zipped (17 MB). For your convenience there is also a CSV version here: BPIChallenge2019CSV (38 MB). (The CSV file has Literal columns which contain free text. Sometimes, they contain also the “,” symbol which is used in the file to separate columns. To properly parse each line, you can use the following code in Java: String.split(“,(?=([^\”]*\”[^\”]*\”)*[^\”]*$)”). This regular expression splits each line on a “,” symbol, but ignores any parts between double quotes. After splitting your String.trim() to remove the quotes from the items.)

When using this data, please refer to it as:

van Dongen, B.F., Dataset BPI Challenge 2019. 4TU.Centre for Research Data. 
https://doi.org/10.4121/uuid:d06aff4b-79f0-45e6-8ec8-e19730c248f1

We have contacted several vendors to see if they are willing to provide pre-configured versions of their tools for the participants to use. As soon as we get confirmation, we put up their names and a description how to get access to the data in their tools in the list below:

Available tools from vendors:

The Challenges

The company is interested in answers to three main questions:

  • Is there a collection of process models which together properly describe the process in this data. Based on the four categories above, at least 4 models are needed, but any collection of models that together explain the process well is appreciated. Preferably, the decision which model explains which purchase item best is based on properties of the item.
  • What is the throughput of the invoicing process, i.e. the time between goods receipt, invoice receipt and payment (clear invoice)? To answer this, a technique is sought to match these events within a line item, i.e. if there are multiple goods receipt messages and multiple invoices within a line item, how are they related and which belong together?
  • Finally, which Purchase Documents stand out from the log? Where are deviations from the processes discovered in (1) and how severe are these deviations? Deviations may be according to the prescribed high-level process flow, but also with respect to the values of the invoices. Which customers produce a lot of rework as invoices are wrong, etc.?

The Categories

Unlike last year, there are now only two categories, namely students and non-students. The previous years, the academic category was intended for academics to showcase their latest developments in process mining techniques. This role is taken over by the process discovery contest and the conformance checking contest.

The Student Category

This category targets Bachelor, Master and PhD students or student teams. In this category, the focus is on the originality of the results, the validity of the claims and the depth of the analysis of specific issues identified. We expect participants can focus on a specific aspect of interest and analyze this aspect in great detail. Here, one can choose for example to focus on specific models, such as control-flow models, social network models, performance models, predictive models, etc.

The Non-Student Category

This category targets academics and professionals to show their skills in analyzing business processes. The submitted reports are judged on their level of professionalism and originality of the results. The participants are expected to report on a broader range of aspects, where each aspect does not have to be developed in full detail. The report submitted in this category will be judged on its completeness of analysis and usefulness for the purpose of a real-life process mining setting.

The Prize

The winner in both categories will be selected by a jury and a representative of the winning report in each category will be invited to join the conference in Aachen to be present during the conference dinner to receive a certificate.

Submissions

Submissions should be made through EasyChair at https://easychair.org/conferences/?conf=icpm2019 where you indicate your submission to be a BPI Challenge submission. A submission should contain a pdf report of at most 25 pages, including figures, using the LNCS/LNBIP format (http://www.springer.com/computer/lncs?SGWID=0-164-6-791344-0) specified by Springer (available for both LaTeX and MS Word). Appendices may be included, but should only support the main text.

Questions about the challenge

Like before, participants can post questions about the data/process in the ProM forum. The company monitors the messages there and will try to respond as soon as possible.