Klasse StatisticsBuilder

java.lang.Object
org.deidentifier.arx.aggregates.StatisticsBuilder

public class StatisticsBuilder extends Object
A class offering basic descriptive statistics about data handles.
  • Konstruktordetails

    • StatisticsBuilder

      public StatisticsBuilder(DataHandleInternal handle)
      Creates a new instance.
      Parameter:
      handle -
  • Methodendetails

    • getClassificationPerformance

      public StatisticsClassification getClassificationPerformance(String clazz, ARXClassificationConfiguration<?> config) throws ParseException
      Creates a new set of statistics for the given classification task
      Parameter:
      clazz - - The class attribute
      config - - The configuration
      Löst aus:
      ParseException
    • getClassificationPerformance

      public StatisticsClassification getClassificationPerformance(String[] features, String clazz, ARXClassificationConfiguration<?> config) throws ParseException
      Creates a new set of statistics for the given classification task
      Parameter:
      features - - The feature attributes
      clazz - - The class attributes
      config - - The configuration
      Löst aus:
      ParseException
    • getClassificationPerformance

      public StatisticsClassification getClassificationPerformance(String[] features, String clazz, ARXClassificationConfiguration<?> config, ARXFeatureScaling scaling) throws ParseException
      Creates a new set of statistics for the given classification task
      Parameter:
      features - - The feature attributes
      clazz - - The class attributes
      config - - The configuration
      scaling - - Feature scaling
      Löst aus:
      ParseException
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, boolean orderFromDefinition1, int column2, boolean orderFromDefinition2)
      Returns a contingency table for the given columns.
      Parameter:
      column1 - The first column
      orderFromDefinition1 - Indicates whether the order that should be assumed for string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      column2 - The second column
      orderFromDefinition2 - Indicates whether the order that should be assumed for string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      Gibt zurück:
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, int column2)
      Returns a contingency table for the given columns. This method assumes that the order of string data items will be derived from the hierarchies provided in the data definition (if any)
      Parameter:
      column1 - The first column
      column2 - The second column
      Gibt zurück:
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, int size1, boolean orderFromDefinition1, int column2, int size2, boolean orderFromDefinition2)
      Returns a contingency table for the given columns.
      Parameter:
      column1 - The first column
      size1 - The maximal size in this dimension
      orderFromDefinition1 - Indicates whether the order that should be assumed for string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      column2 - The second column
      size2 - The maximal size in this dimension
      orderFromDefinition2 - Indicates whether the order that should be assumed for string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      Gibt zurück:
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, int size1, int column2, int size2)
      Returns a contingency table for the given columns. This method assumes that the order of string data items can (and should) be derived from the hierarchies provided in the data definition (if any)
      Parameter:
      column1 - The first column
      size1 - The maximal size in this dimension
      column2 - The second column
      size2 - The maximal size in this dimension
      Gibt zurück:
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, int size1, String[][] hierarchy1, int column2, int size2, String[][] hierarchy2)
      Returns a contingency table for the given columns. The order for string data items is derived from the provided hierarchies
      Parameter:
      column1 - The first column
      size1 - The maximal size in this dimension
      hierarchy1 - The hierarchy for the first column, may be null
      column2 - The second column
      size2 - The maximal size in this dimension
      hierarchy2 - The hierarchy for the second column, may be null
      Gibt zurück:
    • getContingencyTable

      public StatisticsContingencyTable getContingencyTable(int column1, String[][] hierarchy1, int column2, String[][] hierarchy2)
      Returns a contingency table for the given columns. The order for string data items is derived from the provided hierarchies
      Parameter:
      column1 - The first column
      hierarchy1 - The hierarchy for the first column, may be null
      column2 - The second column
      hierarchy2 - The hierarchy for the second column, may be null
      Gibt zurück:
    • getDistinctValues

      public String[] getDistinctValues(int column)
      Returns the distinct set of data items from the given column.
      Parameter:
      column - The column
      Gibt zurück:
    • getDistinctValuesOrdered

      public String[] getDistinctValuesOrdered(int column)
      Returns an ordered list of the distinct set of data items from the given column. This method assumes that the order of string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      Parameter:
      column - The column
      Gibt zurück:
    • getDistinctValuesOrdered

      public String[] getDistinctValuesOrdered(int column, boolean orderFromDefinition)
      Returns an ordered list of the distinct set of data items from the given column.
      Parameter:
      column - The column
      orderFromDefinition - Indicates whether the order that should be assumed for string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      Gibt zurück:
    • getDistinctValuesOrdered

      public String[] getDistinctValuesOrdered(int column, String[][] hierarchy)
      Returns an ordered list of the distinct set of data items from the given column. This method assumes that the order of string data items can (and should) be derived from the provided hierarchy
      Parameter:
      column - The column
      hierarchy - The hierarchy, may be null
      Gibt zurück:
    • getEquivalenceClassStatistics

      public StatisticsEquivalenceClasses getEquivalenceClassStatistics()
      Returns statistics about the equivalence classes.
      Gibt zurück:
    • getFrequencyDistribution

      public StatisticsFrequencyDistribution getFrequencyDistribution(int column)
      Returns a frequency distribution for the values in the given column. This method assumes that the order of string data items can (and should) be derived from the hierarchy provided in the data definition (if any)
      Parameter:
      column - The column
      Gibt zurück:
    • getFrequencyDistribution

      public StatisticsFrequencyDistribution getFrequencyDistribution(int column, boolean orderFromDefinition)
      Returns a frequency distribution for the values in the given column.
      Parameter:
      column - The column
      orderFromDefinition - Indicates whether the order that should be assumed for string data items should be derived from the hierarchy provided in the data definition (if any)
      Gibt zurück:
    • getFrequencyDistribution

      public StatisticsFrequencyDistribution getFrequencyDistribution(int column, String[][] hierarchy)
      Returns a frequency distribution for the values in the given column. The order for string data items is derived from the provided hierarchy
      Parameter:
      column - The column
      hierarchy - The hierarchy, may be null
      Gibt zurück:
    • getInterruptibleInstance

      public StatisticsBuilderInterruptible getInterruptibleInstance()
      Returns an interruptible instance of this object.
      Gibt zurück:
    • getQualityStatistics

      public StatisticsQuality getQualityStatistics()
      Returns data quality according to various models.
      Gibt zurück:
    • getQualityStatistics

      public StatisticsQuality getQualityStatistics(DataHandle output)
      Returns data quality according to various models. This is a special variant of the method supporting arbitrary user-defined outputs.
      Parameter:
      output -
      Gibt zurück:
    • getQualityStatistics

      public StatisticsQuality getQualityStatistics(DataHandle output, Set<String> qis)
      Returns data quality according to various models. This is a special variant of the method supporting arbitrary user-defined outputs.
      Parameter:
      output -
      qis -
      Gibt zurück:
    • getQualityStatistics

      public StatisticsQuality getQualityStatistics(Set<String> qis)
      Returns data quality according to various models.
      Parameter:
      qis -
      Gibt zurück:
    • getSummaryStatistics

      public <T> Map<String,StatisticsSummary<?>> getSummaryStatistics(boolean listwiseDeletion)
      Returns summary statistics for all attributes.
      Parameter:
      listwiseDeletion - A flag enabling list-wise deletion
      Gibt zurück: