The F-score used in the contest is the F-score over the accuracy of the positive (fitting) traces and the accuracy of the negative (non-fitting) traces. As a result, if a model classifies all traces as negative, the F-score will be 0%.
Handling non-determinism in discovery algorithms
The non-determinism we encountered in some discovery algorithms is punished by using the minimal (positive and negative) accuracy values over the runs (we run every discovery algorithm three times). As an example, if for the test log containing 500 positive traces 495, 490, and 500 traces were classified as positive, then the positive accuracy is 490/500 = 98%.