List of instances of :py:class:`DataSet`.

This class implements some useful slicing functions.

Also it will merge data of DataSet instances that are identical
(according to function __eq__ of DataSet).
Method __init__ Instantiate self from a list of folder- or filenames or DataSet instances.
Method processIndexFile Reads in an index (.info?) file information on the different runs.
Method append Redefines the append method to check for unicity.
Method extend Extend with elements.
Method pickle Loop over self to pickle each element.
Method dictByAlg Returns a dictionary of instances of this class by algorithm.
Method dictByAlgName Returns a dictionary of instances of this class by algorithm.
Method dictByDim Returns a dictionary of instances of this class by dimensions.
Method dictByFunc Returns a dictionary of instances of this class by functions.
Method dictByDimFunc Returns a dictionary of instances of this class by dimensions and for each dimension by function.
Method dictByNoise Returns a dictionary splitting noisy and non-noisy entries.
Method isBiobjective Undocumented
Method dictByFuncGroupBiobjective Returns a dictionary of instances of this class by function groups for bi-objective case.
Method dictByFuncGroupSingleObjective Returns a dictionary of instances of this class by function groups for single objective case.
Method dictByFuncGroup Returns a dictionary of instances of this class by function groups.
Method getFuncGroups Returns a dictionary of function groups.
Method dictByParam Returns a dictionary of DataSetList by parameter values.
Method info Display some information onscreen.
Method sort Undocumented
Method run_length_distributions No summary
Method get_all_data_lines return a list of all data lines in self for each algorithm and a list of the respective computed aRTs.
Method det_best_data return a list of the number smallest evaluations over all data sets in self for each target in target_values.
Method det_best_data_lines return a list of the respective best data lines over all data sets in self for each target in target_values and an array of the computed scores (ERT if scoring_function == 'ERT').
Method get_sorted_algorithms return list of the algorithms from self, sorted by minimum loss factor in the ECDF.
Method get_reference_values_hash Undocumented
def __init__(self, args=[], check_data_type=True):

Instantiate self from a list of folder- or filenames or DataSet instances.

Exceptions: Warning -- Unexpected user input. pickle.UnpicklingError

Parameterslist argsstrings being either info file names, folder containing info files or pickled data files, or a list of DataSets.
def processIndexFile(self, indexFile):
Reads in an index (.info?) file information on the different runs.
def append(self, o, check_data_type=False):
Redefines the append method to check for unicity.
def extend(self, o):

Extend with elements.

This method is implemented to prevent problems since append was superseded. This method could be the origin of efficiency issue.

def pickle(self, *args, **kwargs):
Loop over self to pickle each element.
def dictByAlg(self):

Returns a dictionary of instances of this class by algorithm.

The resulting dict uses algId and comment as keys and the corresponding slices as values.

def dictByAlgName(self):

Returns a dictionary of instances of this class by algorithm.

Compared to dictByAlg, this method uses only the data folder as key and the corresponding slices as values.

def dictByDim(self):

Returns a dictionary of instances of this class by dimensions.

Returns a dictionary with dimension as keys and the corresponding slices as values.

def dictByFunc(self):

Returns a dictionary of instances of this class by functions.

Returns a dictionary with the function id as keys and the corresponding slices as values.

def dictByDimFunc(self):

Returns a dictionary of instances of this class by dimensions and for each dimension by function.

Returns a dictionary with dimension as keys and the corresponding slices as values.

ds = dsl.dictByDimFunc[40][2] # DataSet dimension 40 on F2
def dictByNoise(self):
Returns a dictionary splitting noisy and non-noisy entries.
def isBiobjective(self):
Undocumented
def dictByFuncGroupBiobjective(self):

Returns a dictionary of instances of this class by function groups for bi-objective case.

The output dictionary has function group names as keys and the corresponding slices as values.

def dictByFuncGroupSingleObjective(self):

Returns a dictionary of instances of this class by function groups for single objective case.

The output dictionary has function group names as keys and the corresponding slices as values. Current groups are based on the GECCO-BBOB 2009-2013 function testbeds.

def dictByFuncGroup(self):

Returns a dictionary of instances of this class by function groups.

The output dictionary has function group names as keys and the corresponding slices as values.

def getFuncGroups(self):

Returns a dictionary of function groups.

The output dictionary has functions group names as keys and function group descriptions as values.

def dictByParam(self, param):
Returns a dictionary of DataSetList by parameter values.
Returnsa dictionary with values of parameter param as keys and the corresponding slices of DataSetList as values.
def info(self, opt=None):
Display some information onscreen.
Parametersstring optchanges size of output, can be 'all' (default), 'short'
def sort(self, key1='dim', key2='funcId'):
Undocumented
def run_length_distributions(self, dimension, target_values, fun_list=None, reference_data_set_list=None, reference_scoring_function=lambda x: toolsstats.prctile(x, [5])[0], data_per_target=15, flatten_output_dict=True, simulated_restarts=False, bootstrap=False):

return a dictionary with an entry for each algorithm, or for only one algorithm the dictionary value if flatten_output_dict is True, and the left envelope rld-array.

For each algorithm the entry contains a sorted rld-array of evaluations to reach the targets on all functions in func_list or all functions in self, the list of solved functions, the list of processed functions. If the sorted rld-array is normalized by the reference score (after sorting), the last entry is the original rld.

Example:

%pylab
dsl = cocopp.load(...)  # a single algorithm
rld = dsl.run_length_distributions(10, [1e-1, 1e-3, 1e-5])
step(rld[0][0], np.linspace(0, 1, len(rld[0][0]),
     endpoint=True)

TODO: change interface to return always rld_original and optional the scores to compare with such that we need to compute rld[0][0] / rld[0][-1] to get the current output?

If reference_data_set_list is not None evaluations are normalized by the reference data, however the data remain to be sorted without normalization.

Parameterssimulated_restartsuse simulated trials instead of "raw" evaluations from calling DataSet.detEvals. simulated_restarts may be a bool, or a kwargs dict passed like **simulated_restarts to the method DataSet.evals_with_simulated_restarts, or it may indicate the number of simulated trials. By default, the first trial is chosen without replacement. That means, if the number of simulated trials equals to nbRuns(), the result is the same as from DataSet.detEvals, bar the ordering of the data. If bootstrap is active, the number is set to nbRuns() and the first trial is chosen with replacement.
bootstrapif bootstrap, the number of evaluations is bootstrapped within the instances/trials or via simulated restarts.
def get_all_data_lines(self, target_value, fct, dim):

return a list of all data lines in self for each algorithm and a list of the respective computed aRTs.

Example

Get all run lengths of all trials on f1 in 20-D to reach target 1e-7:

data = dsl.get_all_data_lines(1e-7, 1, 20)[0]
flat_data = np.hstack(data)
plot(np.arange(1, 1+len(flat_data)) / len(flat_data),
     sort(flat_data))  # sorted fails on nan
def det_best_data(self, target_values, fct, dim, number=15):

return a list of the number smallest evaluations over all data sets in self for each target in target_values.

Detail: currently, the minimal observed evaluation is computed instance-wise and the number "easiest" instances are returned. That is, if number is the number of instances, the best eval for each instance is returned. Also the smallest number evaluations regardless of instance are computed, but not returned.

def det_best_data_lines(self, target_values, fct, dim, scoring_function=None):

return a list of the respective best data lines over all data sets in self for each target in target_values and an array of the computed scores (ERT if scoring_function == 'ERT').

A data line is the set of evaluations from all (usually 15) runs for a given target value. The score determines which data line is "best".

If scoring_function is None, the best is determined with method detERT. Using scoring_function=lambda x: toolsstat.prctile(x, [5], ignore_nan=False) is another useful alternative.

def get_sorted_algorithms(self, dimension, target_values, fun_list=None, reference_dataset_list=None, smallest_evaluation_to_use=3):

return list of the algorithms from self, sorted by minimum loss factor in the ECDF.

Best means to be within loss of the best algorithm at at least one point of the ECDF from the functions fun_list, i.e. minimal distance to the left envelope in the semilogx plot.

target_values gives for each function-dimension pair a list of target values.

TODO: data generation via run_length_distributions and sorting should probably be separated.

def get_reference_values_hash(self):
Undocumented
API Documentation for cocopp, generated by pydoctor at 2020-01-21 16:27:37.