cocopp.pproc.DataSetList

class documentation

class DataSetList(list):

Constructor: DataSetList(args, check_data_type)

List of instances of DataSet.

This class implements some useful slicing functions.

Also it will merge data of DataSet instances that are identical (according to function __eq__ of DataSet).

Method	`__init__`	Instantiate self from a list of folder- or filenames or `DataSet` instances.
Method	`append`	Redefines the append method to check for unicity.
Method	`by`	Returns a dictionary of `DataSetList` instances by `attr_name`.
Method	`det_best_data`	return a list of the `number` smallest evaluations over all data sets in `self` for each `target in target_values`.
Method	`det_best_data_lines`	return a list of the respective best data lines over all data sets in `self` for each `target in target_values` and an array of the computed scores (ERT `if scoring_function == 'ERT'`).
Method	`dictByAlg`	Returns a dictionary of instances of this class by algorithm.
Method	`dictByAlgName`	Returns a dictionary of instances of this class by algorithm.
Method	`dictByDim`	Returns a dictionary of instances of this class by dimensions.
Method	`dictByDimFunc`	Returns a dictionary of instances of this class by dimensions and for each dimension by function.
Method	`dictByFunc`	Returns a dictionary of instances of this class by functions.
Method	`dictByFuncCons`	Returns a dictionary of instances of this class by objective functions (grouping over constraints).
Method	`dictByFuncGroup`	Returns a dictionary of instances of this class by function groups.
Method	`dictByFuncGroupBiobjective`	Returns a dictionary of instances of this class by function groups for bi-objective case.
Method	`dictByFuncGroupSingleObjective`	Returns a dictionary of instances of this class by function groups for single objective case.
Method	`dictByNoise`	Returns a dictionary splitting noisy and non-noisy entries.
Method	`dictByParam`	Returns a dictionary of DataSetList by parameter values.
Method	`extend`	Extend with elements.
Method	`filter`	discard DataSets for which `condition(dataset)` is not true.
Method	`filtered`	return a `list` of `DataSet` for which `condition(dataset)` holds.
Method	`get_all_data_lines`	return a list of all data lines in `self` for each algorithm and a list of the respective computed ERTs.
Method	`get_reference_values_hash`	Undocumented
Method	`get_sorted_algorithms`	return list of the algorithms from `self`, sorted by minimum loss factor in the ECDF.
Method	`getFuncGroups`	Returns a dictionary of function groups.
Method	`info`	Display some information onscreen.
Method	`isBiobjective`	Undocumented
Method	`pickle`	Loop over self to pickle each element, not in use (anymore).
Method	`processIndexFile`	Reads in an index (.info?) file information on the different runs.
Method	`remove_if`	discard DataSets for which `condition(self[i])` is true.
Method	`run_length_distributions`	return a dictionary with an entry for each algorithm, or for only one algorithm the dictionary value if `flatten_output_dict is True`, and the left envelope rld-array.
Method	`sort`	Undocumented
Instance Variable	`current_testbed`	Undocumented

def __init__(self, args=[], check_data_type=True): ¶

Instantiate self from a list of folder- or filenames or DataSet instances.

Exceptions: Warning -- Unexpected user input. pickle.UnpicklingError

Parameters
args	Undocumented
check_data_type	Undocumented
list args	strings being either info file names, folder containing info files or pickled data files, or a list of DataSets.

def append(self, o, check_data_type='warn'): ¶

Redefines the append method to check for unicity.

def by(self, attr_name): ¶

Returns a dictionary of DataSetList instances by attr_name.

attr_name values are the dictionary keys and the corresponding slices (partial lists) are the values.

May in future replace some of the specific methods, for example, dsl.dictByDim() == dsl.by('dim').

def det_best_data(self, target_values, fct, dim, number=15): ¶

return a list of the number smallest evaluations over all data sets in self for each target in target_values.

Detail: currently, the minimal observed evaluation is computed instance-wise and the number "easiest" instances are returned. That is, if number is the number of instances, the best eval for each instance is returned. Also the smallest number evaluations regardless of instance are computed, but not returned.

def det_best_data_lines(self, target_values, fct, dim, scoring_function=None): ¶

return a list of the respective best data lines over all data sets in self for each target in target_values and an array of the computed scores (ERT if scoring_function == 'ERT').

A data line is the set of evaluations from all (usually 15) runs for a given target value. The score determines which data line is "best".

If scoring_function is None, the best is determined with method detERT. Using scoring_function=lambda x: toolsstat.prctile(x, [5], ignore_nan=False) is another useful alternative.

TODO: do we want to append equal-instance lines for detEvals?

def dictByAlg(self): ¶

Returns a dictionary of instances of this class by algorithm.

The resulting dict uses algId and comment as keys and the corresponding slices as values.

def dictByAlgName(self): ¶

Returns a dictionary of instances of this class by algorithm.

Compared to dictByAlg, this method uses only the data folder as key and the corresponding slices as values.

def dictByDim(self): ¶

Returns a dictionary of instances of this class by dimensions.

Returns a dictionary with dimension as keys and the corresponding slices as values.

def dictByDimFunc(self): ¶

Returns a dictionary of instances of this class by dimensions and for each dimension by function.

Returns a dictionary with dimension as keys and the corresponding slices as values.

ds = dsl.dictByDimFunc[40][2] # DataSet dimension 40 on F2

def dictByFunc(self): ¶

Returns a dictionary of instances of this class by functions.

Returns a dictionary with the function id as keys and the corresponding slices as values.

def dictByFuncCons(self): ¶

Returns a dictionary of instances of this class by objective functions (grouping over constraints).

Should be used only with the constrained test bed.

Returns a dictionary with the function string identifiers as keys and the corresponding slices as values.

def dictByFuncGroup(self): ¶

Returns a dictionary of instances of this class by function groups.

The output dictionary has function group names as keys and the corresponding slices as values.

def dictByFuncGroupBiobjective(self): ¶

Returns a dictionary of instances of this class by function groups for bi-objective case.

The output dictionary has function group names as keys and the corresponding slices as values.

def dictByFuncGroupSingleObjective(self): ¶

Returns a dictionary of instances of this class by function groups for single objective case.

The output dictionary has function group names as keys and the corresponding slices as values. Current groups are based on the GECCO-BBOB 2009-2013 function testbeds.

def dictByNoise(self): ¶

Returns a dictionary splitting noisy and non-noisy entries.

def dictByParam(self, param): ¶

Returns a dictionary of DataSetList by parameter values.

Returns
a dictionary with values of parameter param as keys and the corresponding slices of DataSetList as values.

def extend(self, o): ¶

Extend with elements.

This method is implemented to prevent problems since append was superseded. This method could be the origin of efficiency issue.

def filter(self, condition, verbose=0): ¶

discard DataSets for which condition(dataset) is not true.

The filter method makes changes in place and returns None unless verbose >= 2. It does nothing if not condition or condition is True.

Example: .filter(lambda ds: ds.funcId in [1, 2] + list(range(5, 15))) keeps only the "effectively unimodal" functions of the 'bbob' suite.

Details: if verbose in [2, 3], return a list of the removed data sets.

def filtered(self, condition, type_='DataSetList'): ¶

return a list of DataSet for which condition(dataset) holds.

For example, condition=lambda ds: ds.funcId in [1, 2] + list(range(5, 15)) returns for the 'bbob' suite the data sets on the "effectively unimodal" functions.

The filtered method returns an empty list if not condition and a list with all elements if condition is True (or condition=lambda ds: True). If type_ is list return a simple list. If type_ != 'DataSetList' the resulting type is a list, however this may change in future.

Example

Get all run lengths of all trials on f1 in 20-D to reach target 1e-7:

data = dsl.get_all_data_lines(1e-7, 1, 20)[0]
flat_data = np.hstack(data)
plot(np.arange(1, 1+len(flat_data)) / len(flat_data),
     sort(flat_data))  # sorted fails on nan

def get_reference_values_hash(self): ¶

Undocumented

def get_sorted_algorithms(self, dimension, target_values, fun_list=None, reference_dataset_list=None, smallest_evaluation_to_use=3): ¶

return list of the algorithms from self, sorted by minimum loss factor in the ECDF.

Best means to be within loss of the best algorithm at at least one point of the ECDF from the functions fun_list, i.e. minimal distance to the left envelope in the semilogx plot.

target_values gives for each function-dimension pair a list of target values.

TODO: data generation via run_length_distributions and sorting should probably be separated.

def getFuncGroups(self): ¶

Returns a dictionary of function groups.

The output dictionary has functions group names as keys and function group descriptions as values.

def info(self, opt=None): ¶

Display some information onscreen.

Parameters
opt	Undocumented
string opt	changes size of output, can be 'all' (default), 'short'

def isBiobjective(self): ¶

Undocumented

def pickle(self, *args, **kwargs): ¶

Loop over self to pickle each element, not in use (anymore).

def processIndexFile(self, indexFile, alg_name=None): ¶

Reads in an index (.info?) file information on the different runs.

def remove_if(self, condition): ¶

discard DataSets for which condition(self[i]) is true.

To remove functions f1 and f5 we can use condition=lambda ds: ds.funcId in [1, 5] which is equivalent with calling .filter(lambda ds: ds.funcId not in [1, 5]).

def run_length_distributions(self, dimension, target_values, fun_list=None, reference_data_set_list=None, reference_scoring_function=(lambda x: toolsstats.prctile(x, [5])[0]), data_per_target=15, flatten_output_dict=True, simulated_restarts=False, bootstrap=False): ¶

return a dictionary with an entry for each algorithm, or for only one algorithm the dictionary value if flatten_output_dict is True, and the left envelope rld-array.

For each algorithm the entry contains a sorted rld-array of evaluations to reach the targets on all functions in func_list or all functions in self, the list of solved functions, the list of processed functions. If the sorted rld-array is normalized by the reference score (after sorting), the last entry is the original rld.

Example:

%pylab
dsl = cocopp.load(...)  # a single algorithm
rld = dsl.run_length_distributions(10, [1e-1, 1e-3, 1e-5])
step(rld[0][0], np.linspace(0, 1, len(rld[0][0]),
     endpoint=True)

TODO: change interface to return always rld_original and optional the scores to compare with such that we need to compute rld[0][0] / rld[0][-1] to get the current output?

If reference_data_set_list is not None evaluations are normalized by the reference data, however the data remain to be sorted without normalization.

Parameters
dimension	Undocumented
target_values	Undocumented
fun_list	Undocumented
reference_data_set_list	Undocumented
reference_scoring_function	Undocumented
data_per_target	Undocumented
flatten_output_dict	Undocumented
simulated_restarts	use simulated trials instead of "raw" evaluations from calling `DataSet.detEvals`. `simulated_restarts` may be a `bool`, or a kwargs `dict` passed like `*simulated_restarts` to the method `DataSet.evals_with_simulated_restarts`, or it may indicate the number of simulated trials. By default, the first trial is chosen without replacement. That means, if the number of simulated trials equals to `nbRuns()`, the result is the same as from `DataSet.detEvals`, bar the ordering of the data. If `bootstrap` is active, the number is set to `nbRuns()` and the first trial is chosen with* replacement.
bootstrap	`if bootstrap`, the number of evaluations is bootstrapped within the instances/trials or via simulated restarts.

def sort(self, key1='dim', key2='funcId'): ¶

Undocumented

current_testbed = ¶

Undocumented