Transformation models

Supported transformation methods

ARX supports a variety of common data transformation models, which can also be combined with each other.

Global and local transformation schemes

ARX can be configured to apply the same transformation scheme to all records in a dataset or apply different transformation schemes to different subsets of the records. The maximal number of transformations that may be used can be specified. The exact nature of the resulting data transformation scheme depends on additional parameters set by the user. If value generalization hierarchies have been specified, for example, performing global transformation will result in full-domain generalization, where each value of an attribute's domain is transformed to the same generalization level. With local transformation, different generalization levels may be used for the same attribute value in different records. Analogously, if transformation rules have been specified that only suppress values, a global transformation process will result in attribute suppression, while a local transformation process will result in a cell suppression scheme.

Value generalization

User-specified generalization hierarchies form the backbone of ARX's data transformation mechanism. Hierarchies can either be used to directly reduce the uniqueness of attribute values or to form clusters that will be transformed using further methods, such as microaggregation.

Random sampling

ARX supports multiple methods for drawing a sample from the input dataset. This can be used to relate a dataset to an underlying population table or to reduce privacy risks. Random sampling is further used to introduce randomness into the differential privacy mechanism supported by ARX.

Record, attribute and cell suppression

As described previously, ARX also supports removing individual attributes, attribute values or complete records in the transformation process. This can be controlled by defining appropriate hierarchies (which is supported by specific wizards), by performing local or global transformation and by specifying a limit for the maximal number of records which may be removed.

Microaggregation

Sets of numeric attribute values can be transformed into a common value by user-specified aggregation functions. Prior to aggregation, clustering can be performed based on value generalization hierarchies.

Top- and bottom-coding

By constructing appropriate hierarchies using ARX's built-in wizards, hierarchies can be created that truncate values exceeding a user-specified range.

Categorization

The wizards provided by the software can be used to create transformation rules that are represented as functions, which can be used to perform on-the-fly categorization of continuous variables during anonymization.