Klasse DataHandle

java.lang.Object
org.deidentifier.arx.DataHandle
Bekannte direkte Unterklassen:
DataHandleInput, DataHandleOutput, DataHandleSubset

public abstract class DataHandle extends Object
This class provides access to dictionary encoded data. Furthermore, the data is linked to the associated input or output data. This means that, e.g., if the input data is sorted, the output data will be sorted accordingly. This ensures that original tuples and their generalized counterpart will always have the same row index, which is important for many use cases, e.g., for graphical tools that allow to compare the original dataset to generalized versions.
  • Felddetails

    • columnToDataType

      protected DataType<?>[] columnToDataType
      The data types.
    • definition

      protected DataDefinition definition
      The data definition.
    • headerMap

      protected com.carrotsearch.hppc.ObjectIntOpenHashMap<String> headerMap
      The header.
    • node

      protected ARXLattice.ARXNode node
      The node.
    • registry

      protected org.deidentifier.arx.DataRegistry registry
      The current registry.
    • subset

      protected DataHandle subset
      The current research subset.
  • Konstruktordetails

    • DataHandle

      public DataHandle()
  • Methodendetails

    • getAttributeName

      public abstract String getAttributeName(int col)
      Returns the name of the specified column.
      Parameter:
      col - The column index
      Gibt zurück:
      the attribute name
    • getColumnIndexOf

      public int getColumnIndexOf(String attribute)
      Returns the index of the given attribute, -1 if it is not in the header.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the column index of
    • getDataType

      public DataType<?> getDataType(String attribute)
      Returns the according data type.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the data type
    • getDate

      public Date getDate(int row, int col) throws ParseException
      Returns a date/time value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the date
      Löst aus:
      ParseException - the parse exception
    • getDefinition

      public DataDefinition getDefinition()
      Returns the data definition.
      Gibt zurück:
      the definition
    • getDistinctValues

      public final String[] getDistinctValues(int column)
      Returns an array containing the distinct values in the given column.
      Parameter:
      column - The column to process
      Gibt zurück:
      the distinct values
    • getDouble

      public Double getDouble(int row, int col) throws ParseException
      Returns a double value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the double
      Löst aus:
      ParseException - the parse exception
    • getFloat

      public Float getFloat(int row, int col) throws ParseException
      Returns a float value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the float
      Löst aus:
      ParseException - the parse exception
    • getGeneralization

      public abstract int getGeneralization(String attribute)
      Returns the generalization level for the attribute.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the generalization
    • getInt

      public Integer getInt(int row, int col) throws ParseException
      Returns an int value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the int
      Löst aus:
      ParseException - the parse exception
    • getLong

      public Long getLong(int row, int col) throws ParseException
      Returns a long value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the long
      Löst aus:
      ParseException - the parse exception
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column)
      Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .
      Parameter:
      column - the column
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale.
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method only returns types that match at least 80% of all values in the column .
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      locale - The locale to use
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      locale - The locale to use
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale.
      Parameter:
      column - the column
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale)
      Returns a mapping from data types to the relative number of values that conform to the according type This method only returns types that match at least 80% of all values in the column .
      Parameter:
      column - the column
      locale - The locale to use
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type.
      Parameter:
      column - the column
      locale - The locale to use
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getNonConformingValues

      public String[] getNonConformingValues(int column, DataType<?> type, int max)
      Returns a set of values that do not conform to the given data type.
      Parameter:
      column - The column to test
      type - The type to test
      max - The maximal number of values returned by this method
      Gibt zurück:
      the non conforming values
    • getNumColumns

      public abstract int getNumColumns()
      Returns the number of columns in the dataset.
      Gibt zurück:
      the num columns
    • getNumConformingValues

      public int getNumConformingValues(int column, DataType<?> type)
      Returns the number of (distinct) values that conform to the given data type.
      Parameter:
      column - The column to test
      type - The type to test
      Gibt zurück:
      the num conforming values
    • getNumRows

      public abstract int getNumRows()
      Returns the number of rows in the dataset.
      Gibt zurück:
      the num rows
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator()
      Returns a risk estimator, using the US population if required
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model)
      Returns a risk estimator
      Parameter:
      model -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, ARXSolverConfiguration config)
      Returns a risk estimator
      Parameter:
      model -
      config -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes)
      Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes
      Parameter:
      model -
      classes -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes, ARXSolverConfiguration config)
      Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes
      Parameter:
      model -
      classes -
      config -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, Set<String> qis)
      Returns a risk estimator for the given set of quasi-identifiers
      Parameter:
      model -
      qis -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, Set<String> qis, ARXSolverConfiguration config)
      Returns a risk estimator for the given set of quasi-identifiers
      Parameter:
      model -
      qis -
      config -
      Gibt zurück:
    • getStatistics

      public abstract StatisticsBuilder getStatistics()
      Returns an object providing access to basic descriptive statistics about the data represented by this handle.
      Gibt zurück:
      the statistics
    • getTransformation

      public ARXLattice.ARXNode getTransformation()
      Returns the transformation .
      Gibt zurück:
      the transformation
    • getValue

      public abstract String getValue(int row, int col)
      Returns the value in the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the value
    • getView

      public DataHandle getView()
      Returns a new data handle that represents a context specific view on the dataset.
      Gibt zurück:
      the view
    • isOptimized

      public boolean isOptimized()
      Has this handle been optimized with local recoding?
      Gibt zurück:
    • isOutlier

      public boolean isOutlier(int row)
      Determines whether a given row is an outlier in the currently associated data transformation.
      Parameter:
      row - the row
      Gibt zurück:
      true, if is outlier
    • isReleased

      public boolean isReleased()
      Determines whether this handle is orphaned, i.e., should not be used anymore
      Gibt zurück:
      true, if this handle has been released
    • isSuppressed

      public boolean isSuppressed(int row)
      Determines whether a given row is completely suppressed
      Parameter:
      row - the row
      Gibt zurück:
      true, if is suppressed
    • iterator

      public abstract Iterator<String[]> iterator()
      Returns an iterator over the data.
      Gibt zurück:
      the iterator
    • release

      public void release()
      Releases this handle and all associated resources. If a input handle is released all associated results are released as well.
    • render

      public ElementData render()
      Renders this object
      Gibt zurück:
    • replace

      public boolean replace(int column, String original, String replacement)
      Replaces the original value with the replacement in the given column. Only supported by handles for input data.
      Parameter:
      column - the column
      original - the original
      replacement - the replacement
      Gibt zurück:
      Whether the original value was found
    • save

      public void save(File file) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - the file
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(File file, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - A file
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(File file, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - the file
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - the out
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - Output stream
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - the out
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - the path
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - A path
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - the path
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • shuffledIterator

      public abstract Iterator<String[]> shuffledIterator()
      Returns an iterator over the data in a random order.
      Gibt zurück:
      the iterator
    • sort

      public void sort(boolean ascending, int... columns)
      Sorts the dataset according to the given columns. Will sort input and output analogously.
      Parameter:
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(int from, int to, boolean ascending, int... columns)
      Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.
      Parameter:
      from - The lower bound
      to - The upper bound
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(cern.colt.Swapper swapper, boolean ascending, int... columns)
      Sorts the dataset according to the given columns. Will sort input and output analogously.
      Parameter:
      swapper - A swapper
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(cern.colt.Swapper swapper, int from, int to, boolean ascending, int... columns)
      Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.
      Parameter:
      swapper - A swapper
      from - The lower bound
      to - The upper bound
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • swap

      public void swap(int row1, int row2)
      Swaps both rows.
      Parameter:
      row1 - the row1
      row2 - the row2
    • checkColumn

      protected void checkColumn(int column1)
      Checks a column index.
      Parameter:
      column1 - the column1
    • checkColumns

      protected void checkColumns(int[] columns)
      Checks the column indexes.
      Parameter:
      columns - the columns
    • checkReleased

      protected void checkReleased()
      Checks whether a registry is referenced.
    • checkRow

      protected void checkRow(int row1, int length)
      Checks a row index.
      Parameter:
      row1 - the row1
      length - the length
    • doRelease

      protected abstract void doRelease()
      Releases all resources.
    • getBaseDataType

      protected DataType<?> getBaseDataType(String attribute)
      Returns the base data type without generalization.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the base data type
    • getColumnToDataType

      protected abstract DataType<?>[] getColumnToDataType()
      Generates an array of data types.
      Gibt zurück:
      the data type array
    • getConfiguration

      protected abstract ARXConfiguration getConfiguration()
      Returns the ARXConfiguration that is currently being used, null if this is an input handle
      Gibt zurück:
    • getDistinctValues

      protected abstract String[] getDistinctValues(int column, boolean ignoreSuppression, DataHandleInternal.InterruptHandler handler)
      Returns the distinct values.
      Parameter:
      column - the column
      ignoreSuppression -
      handler - the handler
      Gibt zurück:
      the distinct values
    • getRegistry

      protected org.deidentifier.arx.DataRegistry getRegistry()
      Returns the registry associated with this handle.
      Gibt zurück:
      the registry
    • getValueIdentifier

      protected abstract int getValueIdentifier(int column, String value)
      Returns the internal value identifier
      Parameter:
      column -
      value -
      Gibt zurück:
    • internalCompare

      protected int internalCompare(int row1, int row2, int[] columns, boolean ascending)
      A negative integer, zero, or a positive integer as the first argument is less than, equal to, or greater than the second. It uses the specified data types for comparison. If no datatype is specified for a specific column it uses string comparison.
      Parameter:
      row1 - the row1
      row2 - the row2
      columns - the columns
      ascending - the ascending
      Gibt zurück:
      the int
    • internalGetEncodedValue

      protected abstract int internalGetEncodedValue(int row, int col, boolean ignoreSuppression)
      Internal representation of get encoded value. Returns -1 for suppressed values.
      Parameter:
      row - the row
      col - the col
      Gibt zurück:
      the value
    • internalGetValue

      protected abstract String internalGetValue(int row, int col, boolean ignoreSuppression)
      Internal representation of get value.
      Parameter:
      row - the row
      col - the col
      Gibt zurück:
      the string
    • internalIsOutlier

      protected abstract boolean internalIsOutlier(int row, int[] columns)
      Returns whether this is an outlier regarding the given columns. If no columns have been specified, true will be returned.
      Parameter:
      row -
      columns -
      Gibt zurück:
    • internalReplace

      protected abstract boolean internalReplace(int column, String original, String replacement)
      Internal replacement method.
      Parameter:
      column - the column
      original - the original
      replacement - the replacement
      Gibt zurück:
      true, if successful
    • isAnonymous

      protected boolean isAnonymous()
      Returns whether the data represented by this handle is anonymous
      Gibt zurück:
    • setHeader

      protected void setHeader(String[] header)
      Sets the current header
      Parameter:
      header -
    • setRegistry

      protected void setRegistry(org.deidentifier.arx.DataRegistry registry)
      Updates the registry.
      Parameter:
      registry - the new registry
    • setView

      protected void setView(DataHandle handle)
      Sets the subset.
      Parameter:
      handle - the new view