Overview of ARX's perspectives
The ARX anonymization tool is divided into four perspectives, which model different aspects of the de-identification process. As is shown in the below screen shot, these steps consist of 1) configuring privacy models, utility measures and transformation methods, 2) exploring the solution space, 3) analyzing data utility and 4) analyzing privacy risks.
In the configuration perspective, input data can be loaded, transformation rules can be specified and all further parameters, such as privacy models and utility measures, can be selected and parameterized. If required, this step can be prepared by performing a risk analysis. Next, the solution space is characterized by executing a de-identification algorithm. The result can be inspected in the exploration perspective. Here, it is possible to search for privacy-preserving data transformations which fulfill the user's requirements, i.e. for transformations that result in output data that is suited for the intended usage scenario. To assess this suitability, the utility analysis perspective provides methods for comparing transformed data sets to the original input data set with methods of descriptive statistics and machine learning. In the fourth perspective, risk analyses can be performed for input data set as well as transformed output data. Based on the results of these analyses, the suitability of a solution candidate may either be confirmed or the configuration of the de-identification process can be modified.
Configuring the de-identification process
In this perspective, first, a data set can be imported into the tool and attribute meta data can be specified, including data types and attribute properties in terms of privacy risks. Second, generalization hierarchies for quasi-identifiers or sensitive attributes can be created semi-automatically with built-in wizards or imported into the tool from CSV files. Third, privacy models, the method for measuring data utility and further parameters, which control the transformation process, can be specified.
Exploration of the solution space
During the de-identification process, ARX constructs a search space consisting of a set of transformations that can be applied to the datasets. This search space is then characterized based on the given privacy models and utility measure. This perspective allows users to browse the solution candidates identified by ARX and to select interesting transformations for further analysis.
Analysis of data utility
To assess the suitability of a specific transformation for a given usage scenario, this perspective supports comparing transformations of the input data set to the original data. To this end, it incorporates various graphical representations of results of univariate and bivariate statistics and supports cell-by-cell comparisons.
To evaluate the suitability of an output data set for machine learning purposes, the perspective also allows to analyze the classification accuracy that can be achieved with a generic logistic regression method.
The perspective also implements a method for local recoding that can be used to further enhance data utility.
Analysis of risks
In this perspective, various metrics reflecting privacy risks are presented. Models implemented by ARX include re-identification risks for the prosecutor, journalist and marketer attacker models as well as estimates of population uniqueness, which can be calculated using different statistical models. Moreover, the perspective also provides access to a method for detecting attributes which must be modified or altered when de-identifying data in compliance to the Safe Harbor method of the US Health Insurance Portability and Accountability Act (HIPAA identifiers) and a method for finding potential quasi-identifiers.
ARX aims at providing a high degree of interoperability with other software systems. Generalization hierarchies and data sets can be imported and exported from and to files containing character separated values (CSV). Moreover, ARX is able to import data from further sources, including MS Excel spreadsheets and relational database systems, such as MS SQL, DB2, MySQL or PostgreSQL.
The data import wizard also supports the renaming, removing and reordering of columns. During data import, data types are automatically detected and data cleansing may be performed. This means that values that do not conform to the specified data type will be replaced with null values, which are handled correctly by all methods implemented in ARX.
For further analyzing data provided by ARX with other tools, all tables can be exported into CSV formatted data via context menus.