Benchmark – ARX – Data Anonymization Tool

This page provides background information and data for the benchmark presented in our paper A Benchmark of Globally-Optimal Anonymization Methods for Biomedical Data at the 27th IEEE International Symposium on Computer-Based Medical Systems (CBMS 2014).

Privacy criteria

All 11 reasonable combinations of the following privacy criteria are evaluated in our benchmark:

Datasets

Below is a list of links for downloading ARX project-files for each dataset. Each project-file contains the following data:

Dataset
Generalization hierarchies
Research subsets
Specification of attribute types

An ARX project-file is a compressed folder that can either be opened with the ARX GUI or with common file managers. Within the folder, the data is available in CSV format and the configuration is available as a human readable XML file.

Dataset-1 (188.7 KB)
Dataset-2 (645.1 KB)
Dataset-3 (445.5 KB)
Dataset-4 (2.8 MB)
Dataset-5 (6.6 MB)

As the licenses of the benchmarking datasets do not permit distribution, the downloads are password protected. Prior to downloading, please request the password.

Implementation

The implementation of our benchmark, including all algorithms, is available on Github. The benchmarking environment is based upon ARX and SUBFRAME. Currently, the following globally-optimal anonymization algorithms are implemented:

Depth-First-Search: the implementation can be found here.
Breadth-First-Search: the implementation can be found here.
Incognito: the implementation can be found here.
Optimal Lattice Anonymization: implementation details are presented in this paper and the implementation can be found here.
Flash: the implementation can be found here.

Results

The following figures show key parameters averaged over either the datasets or the privacy criteria. The number of checks gives an indication of an algorithm’s pruning power, the number of roll-ups gives an indication of an algorithm’s optimizability and, finally, the execution times give an indication of an algorithm’s overall performance within the ARX runtime environment.

On a Desktop PC with a quad-core 3.1 GHz Intel Core i5 CPU running a 64-bit Linux 3.0.14 kernel and a 64-bit Sun JVM (1.7.0 21) the following results are produced (java -Xmx4G -XX:+UseConcMarkSweepGC -jar anonbench-0.2.jar):

Geometric mean of key parameters over all five benchmark datasets:

Geometric mean of key parameters over all eleven combinations of privacy criteria:

Changelog

Since the publication of the paper, we have updated anonbench 0.1, which was based on ARX 2.0.0, to anonbench 0.2, which is based on ARX 2.3.0. Due to bugfixes and various performance-related changes in ARX the results of the benchmark have changed slightly in this process. We note, however, that all conclusions drawn from our original results are still valid and strongly recommend using the latest version of anonbench.