Package org.deidentifier.arx
Klasse DataHandle
java.lang.Object
org.deidentifier.arx.DataHandle
- Bekannte direkte Unterklassen:
DataHandleInput,DataHandleOutput,DataHandleSubset
This class provides access to dictionary encoded data. Furthermore, the data
is linked to the associated input or output data. This means that, e.g., if
the input data is sorted, the output data will be sorted accordingly. This
ensures that original tuples and their generalized counterpart will always
have the same row index, which is important for many use cases, e.g., for
graphical tools that allow to compare the original dataset to generalized
versions.
-
Konstruktorübersicht
Konstruktoren -
Methodenübersicht
Modifizierer und TypMethodeBeschreibungabstract StringgetAttributeName(int col) Returns the name of the specified column.intgetColumnIndexOf(String attribute) Returns the index of the given attribute, -1 if it is not in the header.DataType<?> getDataType(String attribute) Returns the according data type.getDate(int row, int col) Returns a date/time value from the specified cell.Returns the data definition.final String[]getDistinctValues(int column) Returns an array containing the distinct values in the given column.getDouble(int row, int col) Returns a double value from the specified cell.getFloat(int row, int col) Returns a float value from the specified cell.abstract intgetGeneralization(String attribute) Returns the generalization level for the attribute.getInt(int row, int col) Returns an int value from the specified cell.getLong(int row, int col) Returns a long value from the specified cell.getMatchingDataTypes(int column) Returns a mapping from data types to the relative number of values that conform to the according type.getMatchingDataTypes(int column, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type.getMatchingDataTypes(int column, Class<U> clazz) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.getMatchingDataTypes(int column, Class<U> clazz, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.getMatchingDataTypes(int column, Class<U> clazz, Locale locale) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.getMatchingDataTypes(int column, Class<U> clazz, Locale locale, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.getMatchingDataTypes(int column, Locale locale) Returns a mapping from data types to the relative number of values that conform to the according type This method only returns types that match at least 80% of all values in the column .getMatchingDataTypes(int column, Locale locale, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type.String[]getNonConformingValues(int column, DataType<?> type, int max) Returns a set of values that do not conform to the given data type.abstract intReturns the number of columns in the dataset.intgetNumConformingValues(int column, DataType<?> type) Returns the number of (distinct) values that conform to the given data type.abstract intReturns the number of rows in the dataset.Returns a risk estimator, using the US population if requiredReturns a risk estimatorgetRiskEstimator(ARXPopulationModel model, Set<String> qis) Returns a risk estimator for the given set of quasi-identifiersgetRiskEstimator(ARXPopulationModel model, Set<String> qis, ARXSolverConfiguration config) Returns a risk estimator for the given set of quasi-identifiersgetRiskEstimator(ARXPopulationModel model, ARXSolverConfiguration config) Returns a risk estimatorgetRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes) Returns a risk estimator for the given set of equivalence classes.getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes, ARXSolverConfiguration config) Returns a risk estimator for the given set of equivalence classes.abstract StatisticsBuilderReturns an object providing access to basic descriptive statistics about the data represented by this handle.Returns the transformation .abstract StringgetValue(int row, int col) Returns the value in the specified cell.getView()Returns a new data handle that represents a context specific view on the dataset.booleanHas this handle been optimized with local recoding?booleanisOutlier(int row) Determines whether a given row is an outlier in the currently associated data transformation.booleanDetermines whether this handle is orphaned, i.e., should not be used anymorebooleanisSuppressed(int row) Determines whether a given row is completely suppressediterator()Returns an iterator over the data.voidrelease()Releases this handle and all associated resources.render()Renders this objectbooleanReplaces the original value with the replacement in the given column.voidWrites the data to a CSV file.voidWrites the data to a CSV file.voidWrites the data to a CSV file.voidsave(OutputStream out) Writes the data to a CSV file.voidsave(OutputStream out, char separator) Writes the data to a CSV file.voidsave(OutputStream out, CSVSyntax config) Writes the data to a CSV file.voidWrites the data to a CSV file.voidWrites the data to a CSV file.voidWrites the data to a CSV file.Returns an iterator over the data in a random order.voidsort(boolean ascending, int... columns) Sorts the dataset according to the given columns.voidsort(int from, int to, boolean ascending, int... columns) Sorts the dataset according to the given columns and the given range.voidsort(cern.colt.Swapper swapper, boolean ascending, int... columns) Sorts the dataset according to the given columns.voidsort(cern.colt.Swapper swapper, int from, int to, boolean ascending, int... columns) Sorts the dataset according to the given columns and the given range.voidswap(int row1, int row2) Swaps both rows.
-
Konstruktordetails
-
DataHandle
public DataHandle()
-
-
Methodendetails
-
getAttributeName
Returns the name of the specified column.- Parameter:
col- The column index- Gibt zurück:
- the attribute name
-
getColumnIndexOf
Returns the index of the given attribute, -1 if it is not in the header.- Parameter:
attribute- the attribute- Gibt zurück:
- the column index of
-
getDataType
Returns the according data type.- Parameter:
attribute- the attribute- Gibt zurück:
- the data type
-
getDate
Returns a date/time value from the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the date
- Löst aus:
ParseException- the parse exception
-
getDefinition
Returns the data definition.- Gibt zurück:
- the definition
-
getDistinctValues
Returns an array containing the distinct values in the given column.- Parameter:
column- The column to process- Gibt zurück:
- the distinct values
-
getDouble
Returns a double value from the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the double
- Löst aus:
ParseException- the parse exception
-
getFloat
Returns a float value from the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the float
- Löst aus:
ParseException- the parse exception
-
getGeneralization
Returns the generalization level for the attribute.- Parameter:
attribute- the attribute- Gibt zurück:
- the generalization
-
getInt
Returns an int value from the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the int
- Löst aus:
ParseException- the parse exception
-
getLong
Returns a long value from the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the long
- Löst aus:
ParseException- the parse exception
-
getMatchingDataTypes
public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column) Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .- Parameter:
column- the column- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale. This method only returns types that match at least 80% of all values in the column .- Typparameter:
U- the generic type- Parameter:
column- the columnclazz- The wrapped class- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method uses the default locale.- Typparameter:
U- the generic type- Parameter:
column- the columnclazz- The wrapped classthreshold- Relative minimal number of values that must match to include a data type in the results- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class. This method only returns types that match at least 80% of all values in the column .- Typparameter:
U- the generic type- Parameter:
column- the columnclazz- The wrapped classlocale- The locale to use- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public <U> List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Class<U> clazz, Locale locale, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type for a given wrapped class.- Typparameter:
U- the generic type- Parameter:
column- the columnclazz- The wrapped classlocale- The locale to usethreshold- Relative minimal number of values that must match to include a data type in the results- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type. This method uses the default locale.- Parameter:
column- the columnthreshold- Relative minimal number of values that must match to include a data type in the results- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale) Returns a mapping from data types to the relative number of values that conform to the according type This method only returns types that match at least 80% of all values in the column .- Parameter:
column- the columnlocale- The locale to use- Gibt zurück:
- the matching data types
-
getMatchingDataTypes
public List<org.apache.commons.math3.util.Pair<DataType<?>,Double>> getMatchingDataTypes(int column, Locale locale, double threshold) Returns a mapping from data types to the relative number of values that conform to the according type.- Parameter:
column- the columnlocale- The locale to usethreshold- Relative minimal number of values that must match to include a data type in the results- Gibt zurück:
- the matching data types
-
getNonConformingValues
Returns a set of values that do not conform to the given data type.- Parameter:
column- The column to testtype- The type to testmax- The maximal number of values returned by this method- Gibt zurück:
- the non conforming values
-
getNumColumns
public abstract int getNumColumns()Returns the number of columns in the dataset.- Gibt zurück:
- the num columns
-
getNumConformingValues
Returns the number of (distinct) values that conform to the given data type.- Parameter:
column- The column to testtype- The type to test- Gibt zurück:
- the num conforming values
-
getNumRows
public abstract int getNumRows()Returns the number of rows in the dataset.- Gibt zurück:
- the num rows
-
getRiskEstimator
Returns a risk estimator, using the US population if required- Gibt zurück:
-
getRiskEstimator
Returns a risk estimator- Parameter:
model-- Gibt zurück:
-
getRiskEstimator
public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, ARXSolverConfiguration config) Returns a risk estimator- Parameter:
model-config-- Gibt zurück:
-
getRiskEstimator
Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes- Parameter:
model-classes-- Gibt zurück:
-
getRiskEstimator
public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, RiskModelHistogram classes, ARXSolverConfiguration config) Returns a risk estimator for the given set of equivalence classes. Saves resources by re-using existing classes- Parameter:
model-classes-config-- Gibt zurück:
-
getRiskEstimator
Returns a risk estimator for the given set of quasi-identifiers- Parameter:
model-qis-- Gibt zurück:
-
getRiskEstimator
public RiskEstimateBuilder getRiskEstimator(ARXPopulationModel model, Set<String> qis, ARXSolverConfiguration config) Returns a risk estimator for the given set of quasi-identifiers- Parameter:
model-qis-config-- Gibt zurück:
-
getStatistics
Returns an object providing access to basic descriptive statistics about the data represented by this handle.- Gibt zurück:
- the statistics
-
getTransformation
Returns the transformation .- Gibt zurück:
- the transformation
-
getValue
Returns the value in the specified cell.- Parameter:
row- The cell's row indexcol- The cell's column index- Gibt zurück:
- the value
-
getView
Returns a new data handle that represents a context specific view on the dataset.- Gibt zurück:
- the view
-
isOptimized
public boolean isOptimized()Has this handle been optimized with local recoding?- Gibt zurück:
-
isOutlier
public boolean isOutlier(int row) Determines whether a given row is an outlier in the currently associated data transformation.- Parameter:
row- the row- Gibt zurück:
- true, if is outlier
-
isReleased
public boolean isReleased()Determines whether this handle is orphaned, i.e., should not be used anymore- Gibt zurück:
- true, if this handle has been released
-
isSuppressed
public boolean isSuppressed(int row) Determines whether a given row is completely suppressed- Parameter:
row- the row- Gibt zurück:
- true, if is suppressed
-
iterator
Returns an iterator over the data.- Gibt zurück:
- the iterator
-
release
public void release()Releases this handle and all associated resources. If a input handle is released all associated results are released as well. -
render
Renders this object- Gibt zurück:
-
replace
Replaces the original value with the replacement in the given column. Only supported by handles for input data.- Parameter:
column- the columnoriginal- the originalreplacement- the replacement- Gibt zurück:
- Whether the original value was found
-
save
Writes the data to a CSV file.- Parameter:
file- the file- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
file- A fileseparator- The utilized separator character- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
file- the fileconfig- the config- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
out- the out- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
out- Output streamseparator- The utilized separator character- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
out- the outconfig- the config- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
path- the path- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
path- A pathseparator- The utilized separator character- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
save
Writes the data to a CSV file.- Parameter:
path- the pathconfig- the config- Löst aus:
IOException- Signals that an I/O exception has occurred.
-
shuffledIterator
Returns an iterator over the data in a random order.- Gibt zurück:
- the iterator
-
sort
public void sort(boolean ascending, int... columns) Sorts the dataset according to the given columns. Will sort input and output analogously.- Parameter:
ascending- Sort ascending or descendingcolumns- An integer array containing column indicides
-
sort
public void sort(int from, int to, boolean ascending, int... columns) Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.- Parameter:
from- The lower boundto- The upper boundascending- Sort ascending or descendingcolumns- An integer array containing column indicides
-
sort
public void sort(cern.colt.Swapper swapper, boolean ascending, int... columns) Sorts the dataset according to the given columns. Will sort input and output analogously.- Parameter:
swapper- A swapperascending- Sort ascending or descendingcolumns- An integer array containing column indicides
-
sort
public void sort(cern.colt.Swapper swapper, int from, int to, boolean ascending, int... columns) Sorts the dataset according to the given columns and the given range. Will sort input and output analogously.- Parameter:
swapper- A swapperfrom- The lower boundto- The upper boundascending- Sort ascending or descendingcolumns- An integer array containing column indicides
-
swap
public void swap(int row1, int row2) Swaps both rows.- Parameter:
row1- the row1row2- the row2
-