Klasse DataHandle

java.lang.Object
org.deidentifier.arx.DataHandle
Bekannte direkte Unterklassen:
DataHandleInput, DataHandleOutput, DataHandleSubset

public abstract class DataHandle extends Object
This class provides access to dictionary encoded data. Furthermore, the data is linked to the associated input or output data. This means that, e.g., if the input data is sorted, the output data will be sorted accordingly. This ensures that original tuples and their generalized counterpart will always have the same row index, which is important for many use cases, e.g., for graphical tools that allow to compare the original dataset to generalized versions.
  • Konstruktordetails

    • DataHandle

      public DataHandle()
  • Methodendetails

    • getAttributeName

      public abstract String getAttributeName(int col)
      Returns the name of the specified column.
      Parameter:
      col - The column index
      Gibt zurück:
      the attribute name
    • getColumnIndexOf

      public int getColumnIndexOf(String attribute)
      Returns the index of the given attribute, -1 if it is not in the header.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the column index of
    • getDataType

      public DataType<?> getDataType(String attribute)
      Returns the according data type.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the data type
    • getDate

      public Date getDate(int row, int col) throws ParseException
      Returns a date/time value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the date
      Löst aus:
      ParseException - the parse exception
    • getDefinition

      public DataDefinition getDefinition()
      Returns the data definition.
      Gibt zurück:
      the definition
    • getDistinctValues

      public final String[] getDistinctValues(int column)
      Returns an array containing the distinct values in the given column.
      Parameter:
      column - The column to process
      Gibt zurück:
      the distinct values
    • getDouble

      public Double getDouble(int row, int col) throws ParseException
      Returns a double value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the double
      Löst aus:
      ParseException - the parse exception
    • getFloat

      public Float getFloat(int row, int col) throws ParseException
      Returns a float value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the float
      Löst aus:
      ParseException - the parse exception
    • getGeneralization

      public abstract int getGeneralization(String attribute)
      Returns the generalization level for the attribute.
      Parameter:
      attribute - the attribute
      Gibt zurück:
      the generalization
    • getInt

      public Integer getInt(int row, int col) throws ParseException
      Returns an int value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the int
      Löst aus:
      ParseException - the parse exception
    • getLong

      public Long getLong(int row, int col) throws ParseException
      Returns a long value from the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the long
      Löst aus:
      ParseException - the parse exception
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column)
      Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .
      Parameter:
      column - the column
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale.
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method only returns types that match at least 80% of all values in the column .
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      locale - The locale to use
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.
      Typparameter:
      U - the generic type
      Parameter:
      column - the column
      clazz - The wrapped class
      locale - The locale to use
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale.
      Parameter:
      column - the column
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale)
      Returns a mapping from data types to the relative number of values that conform to the according type This method only returns types that match at least 80% of all values in the column .
      Parameter:
      column - the column
      locale - The locale to use
      Gibt zurück:
      the matching data types
    • getMatchingDataTypes

      public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale, double threshold)
      Returns a mapping from data types to the relative number of values that conform to the according type.
      Parameter:
      column - the column
      locale - The locale to use
      threshold - Relative minimal number of values that must match to include a data type in the results
      Gibt zurück:
      the matching data types
    • getNonConformingValues

      public String[] getNonConformingValues(int column, DataType<?> type, int max)
      Returns a set of values that do not conform to the given data type.
      Parameter:
      column - The column to test
      type - The type to test
      max - The maximal number of values returned by this method
      Gibt zurück:
      the non conforming values
    • getNumColumns

      public abstract int getNumColumns()
      Returns the number of columns in the dataset.
      Gibt zurück:
      the num columns
    • getNumConformingValues

      public int getNumConformingValues(int column, DataType<?> type)
      Returns the number of (distinct) values that conform to the given data type.
      Parameter:
      column - The column to test
      type - The type to test
      Gibt zurück:
      the num conforming values
    • getNumRows

      public abstract int getNumRows()
      Returns the number of rows in the dataset.
      Gibt zurück:
      the num rows
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator()
      Returns a risk estimator, using the US population if required
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model)
      Returns a risk estimator
      Parameter:
      model -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, ARXSolverConfiguration config)
      Returns a risk estimator
      Parameter:
      model -
      config -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes)
      Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes
      Parameter:
      model -
      classes -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes, ARXSolverConfiguration config)
      Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes
      Parameter:
      model -
      classes -
      config -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, Set<String> qis)
      Returns a risk estimator for the given set of quasi-identifiers
      Parameter:
      model -
      qis -
      Gibt zurück:
    • getRiskEstimator

      public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, Set<String> qis, ARXSolverConfiguration config)
      Returns a risk estimator for the given set of quasi-identifiers
      Parameter:
      model -
      qis -
      config -
      Gibt zurück:
    • getStatistics

      public abstract StatisticsBuilder getStatistics()
      Returns an object providing access to basic descriptive statistics about the data represented by this handle.
      Gibt zurück:
      the statistics
    • getTransformation

      public ARXLattice.ARXNode getTransformation()
      Returns the transformation .
      Gibt zurück:
      the transformation
    • getValue

      public abstract String getValue(int row, int col)
      Returns the value in the specified cell.
      Parameter:
      row - The cell's row index
      col - The cell's column index
      Gibt zurück:
      the value
    • getView

      public DataHandle getView()
      Returns a new data handle that represents a context specific view on the dataset.
      Gibt zurück:
      the view
    • isOptimized

      public boolean isOptimized()
      Has this handle been optimized with local recoding?
      Gibt zurück:
    • isOutlier

      public boolean isOutlier(int row)
      Determines whether a given row is an outlier in the currently associated data transformation.
      Parameter:
      row - the row
      Gibt zurück:
      true, if is outlier
    • isReleased

      public boolean isReleased()
      Determines whether this handle is orphaned, i.e., should not be used anymore
      Gibt zurück:
      true, if this handle has been released
    • isSuppressed

      public boolean isSuppressed(int row)
      Determines whether a given row is completely suppressed
      Parameter:
      row - the row
      Gibt zurück:
      true, if is suppressed
    • iterator

      public abstract Iterator<String[]> iterator()
      Returns an iterator over the data.
      Gibt zurück:
      the iterator
    • release

      public void release()
      Releases this handle and all associated resources. If a input handle is released all associated results are released as well.
    • render

      public ElementData render()
      Renders this object
      Gibt zurück:
    • replace

      public boolean replace(int column, String original, String replacement)
      Replaces the original value with the replacement in the given column. Only supported by handles for input data.
      Parameter:
      column - the column
      original - the original
      replacement - the replacement
      Gibt zurück:
      Whether the original value was found
    • save

      public void save(File file) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - the file
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(File file, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - A file
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(File file, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      file - the file
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - the out
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - Output stream
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(OutputStream out, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      out - the out
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - the path
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path, char separator) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - A path
      separator - The utilized separator character
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • save

      public void save(String path, CSVSyntax config) throws IOException
      Writes the data to a CSV file.
      Parameter:
      path - the path
      config - the config
      Löst aus:
      IOException - Signals that an I/O exception has occurred.
    • shuffledIterator

      public abstract Iterator<String[]> shuffledIterator()
      Returns an iterator over the data in a random order.
      Gibt zurück:
      the iterator
    • sort

      public void sort(boolean ascending, int... columns)
      Sorts the dataset according to the given columns. Will sort input and output analogously.
      Parameter:
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(int from, int to, boolean ascending, int... columns)
      Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.
      Parameter:
      from - The lower bound
      to - The upper bound
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(cern.colt.Swapper swapper, boolean ascending, int... columns)
      Sorts the dataset according to the given columns. Will sort input and output analogously.
      Parameter:
      swapper - A swapper
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • sort

      public void sort(cern.colt.Swapper swapper, int from, int to, boolean ascending, int... columns)
      Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.
      Parameter:
      swapper - A swapper
      from - The lower bound
      to - The upper bound
      ascending - Sort ascending or descending
      columns - An integer array containing column indicides
    • swap

      public void swap(int row1, int row2)
      Swaps both rows.
      Parameter:
      row1 - the row1
      row2 - the row2