cocopp.pproc.DataSet

class documentation

class DataSet(object):

Known subclasses: cocopp.algportfolio.DataSet, cocopp.bestalg.BestAlgSet

Constructor: DataSet(header, comment, data, indexfile)

Unit element for the COCO post-processing.

An instance of this class is created from one unit element of experimental data. One unit element would correspond to data for a given algorithm (a given algId and a comment line) and a given function and dimension (funcId and dim).

Class attributes:

funcId -- function Id (integer)

dim -- dimension (integer)

indexFiles -- associated index files (list of strings)

dataFiles -- associated data files (list of strings)

comment -- comment for the setting (string)

targetFuncValue -- final target function value (float), might be missing

precision -- final ftarget - fopt (float), data with

target[idat] < precision are optional and not relevant.

algId -- algorithm name (string)

evals -- data aligned by function values (2xarray, list of data rows [f_val, eval_run1, eval_run2,...]); caveat: in a portfolio, data rows can have different lengths

funvals -- data aligned by function evaluations (2xarray)

maxevals -- maximum number of function evaluations (array)

maxfgevals -- maximum (i.e. last) weighted sum of evaluations+constraints_evals per instance (array)

finalfunvals -- final function values (array)

readmaxevals -- maximum number of function evaluations read

from index file (array)

readfinalFminusFtarget -- final function values - ftarget read

from index file (array)

pickleFile -- associated pickle file name (string)

target -- == evals[:, 0], target function values attained (array)

suite_name -- name of the test suite like "bbob" or "bbob-biobj"

ert -- ert for reaching the target values in target (array)

instancenumbers -- list of numbers corresponding to the instances of

the test function considered (list of int)

isFinalized -- list of bool for if runs were properly finalized

evals and funvals are arrays of data collected from N data sets.

Both have the same format: zero-th column is the value on which the data of a row is aligned, the N subsequent columns are either the numbers of function evaluations for evals or function values for funvals.

A short example:

>>> from __future__ import print_function
>>> import sys
>>> import os
>>> import urllib
>>> import tarfile
>>> import cocopp
>>> cocopp.genericsettings.verbose = False # ensure to make doctests work
>>> def setup(infoFile):
...     if not os.path.exists(infoFile):
...         filename = cocopp.archives.bbob.get_one('2009/BIPOP-CMA-ES_hansen')
...         tarfile.open(filename).extractall(cocopp.archives.bbob.local_data_path)
>>> infoFile = os.path.join(cocopp.archives.bbob.local_data_path, 'BIPOP-CMA-ES', 'bbobexp_f2.info')
>>> print('get'); setup(infoFile) # doctest:+ELLIPSIS
get...
>>> dslist = cocopp.load(infoFile)
  Data consistent according to consistency_check() in pproc.DataSet
>>> print(dslist)  # doctest:+ELLIPSIS
[DataSet(BIPOP-CMA-ES on f2 2-D), ..., DataSet(BIPOP-CMA-ES on f2 40-D)]
>>> type(dslist)
<class 'cocopp.pproc.DataSetList'>
>>> len(dslist)
6
>>> ds = dslist[3]  # a single data set of type DataSet
>>> ds
DataSet(BIPOP-CMA-ES on f2 10-D)
>>> for d in dir(ds): print(d)  # doctest:+ELLIPSIS
_DataSet__parseHeader
...
algId
algs
bootstrap_sample_size
budget_effective_estimates
comment
...
dim
ert
evals
evals_appended
evals_are_appended
evals_with_simulated_restarts
finalfunvals
funcId
funvals
...
info
info_str
instance_index_lists
instance_multipliers
instancenumbers
instancenumbers_balanced
isBiobjective
isFinalized
mMaxEvals
max_eval
maxevals
maxfgevals
median_evals
nbRuns
nbRuns_raw
number_of_constraints
pickle
plot
plot_funvals
precision
readfinalFminusFtarget
readmaxevals
reference_values
splitByTrials
success_ratio
successes_by_instance
suite_name
target
trial_count_by_instance
>>> all(ds.evals[:, 0] == ds.target)  # first column of ds.evals is the "target" f-value
True
>>> # investigate row 0,10,20,... and of the result columns 0,5,6, index 0 is ftarget
>>> ev = ds.evals[0::10, (0,5,6)]  # doctest:+ELLIPSIS
>>> assert 3.98107170e+07 <= ev[0][0] <= 3.98107171e+07
>>> assert ev[0][1] == 1
>>> assert ev[0][2] == 1
>>> assert 6.07000000e+03 <= ev[-1][-1] <= 6.07000001e+03
>>> # show last row, same columns
>>> ev = ds.evals[-1,(0,5,6)]  # doctest:+ELLIPSIS
>>> assert ev[0] == 1e-8
>>> assert 5.67600000e+03 <= ev[1] <= 5.67600001e+03
>>> ds.info()  # prints similar data more nicely formated
Algorithm: BIPOP-CMA-ES
Function ID: 2
Dimension DIM = 10
Number of trials: 15
Final target Df: 1e-08
min / max number of evals per trial: 5676 / 6346
   evals/DIM:  best     15%     50%     85%     max |  ERT/DIM  nsucc
  ---Df---|-----------------------------------------|----------------
  1.0e+03 |     102     126     170     205     235 |    164.2  15
  1.0e+01 |     278     306     364     457     480 |    374.5  15
  1.0e-01 |     402     445     497     522     536 |    490.8  15
  1.0e-03 |     480     516     529     554     567 |    532.8  15
  1.0e-05 |     513     546     563     584     593 |    562.5  15
  1.0e-08 |     568     594     611     628     635 |    609.6  15

>>> import numpy as np
>>> idx = list(range(0, 50, 10)) + [-1]
>>> # get ERT (expected running time) for some targets
>>> t = np.array([idx, ds.target[idx], ds.ert[idx]]).T  # doctest:+ELLIPSIS
>>> assert t[0][0] == 0
>>> assert t[0][2] == 1
>>> assert t[-1][-2] == 1e-8
>>> assert 6.09626666e+03 <= t[-1][-1] <= 6.09626667e+03

Note that the load of a data set depends on the set of instances specified in testbedsettings' TestBed class (or its children) (None means all instances are read in): >>> import sys >>> import os >>> import urllib >>> import tarfile >>> import cocopp >>> cocopp.genericsettings.verbose = False # ensure to make doctests work >>> infoFile = os.path.join(cocopp.archives.bbob.local_data_path, 'BIPOP-CMA-ES', 'bbobexp_f2.info') >>> if not os.path.exists(infoFile): ... filename = cocopp.archives.bbob.get_one('bbob/2009/BIPOP-CMA-ES_hansen') ... tarfile.open(filename).extractall(cocopp.archives.bbob.local_data_path) >>> dslist = cocopp.load(infoFile)

Data consistent according to consistency_check() in pproc.DataSet

>>> dslist[2].instancenumbers
[1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5]
>>> dslist[2].evals[-1]  # doctest:+ELLIPSIS
array([...
>>> assert (dslist[2].evals[-1])[0] == 1.0e-8
>>> assert 2.01200000e+03 <= (dslist[2].evals[-1])[-1] <= 2.01200001e+03
>>> # because testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] was None
>>> cocopp.testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] = [1, 3]
>>> cocopp.config.config('bbob') # make sure that settings are used
>>> dslist2 = cocopp.load(infoFile)
  Data consistent according to consistency_check() in pproc.DataSet
>>> dslist2[2].instancenumbers
[1, 1, 1, 3, 3, 3]
>>> dslist2[2].evals[-1]  # doctest:+ELLIPSIS
array([...
>>> assert (dslist2[2].evals[-1])[0] == 1.0e-8
>>> assert 2.20700000e+03 <= (dslist2[2].evals[-1])[-1] <= 2.20700001e+03
>>> # set things back to cause no troubles elsewhere:
>>> cocopp.testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] = None
>>> cocopp.config.config('bbob') # make sure that settings are used

Method	`__eq__`	Compare indexEntry instances.
Method	`__init__`	Instantiate a DataSet.
Method	`__ne__`	Undocumented
Method	`__repr__`	Undocumented
Method	`bootstrap_sample_size`	return minimum size not smaller than `sample_size` such that modulo self.nbRuns() == 0
Method	`computeERTfromEvals`	Sets the attributes ert and target from the attribute evals.
Method	`consistency_check`	checks consistency of data set according to - number of instances - instances used
Method	`createDictInstance`	Returns a dictionary of the instances.
Method	`createDictInstanceCount`	Returns a dictionary of the instances and their count.
Method	`detAverageEvals`	Determine the average number of f-evals for each target in `targets` list.
Method	`detERT`	Determine the expected running time (ERT) to reach target values. The value is numpy.inf, if the target was never reached.
Method	`detEvals`	return `len(targets)` data rows `self.evals[i, 1:]`.
Method	`detEvals_by_instance`	return result of `detEvals` for each instance individually
Method	`detSuccesses`	return the number of successful runs for each target.
Method	`detSuccessRates`	return a np.array with the success rate for each target in targets, easiest target first.
Method	`evals_with_simulated_restarts`	Return a `len(targets)` `list` of `samplesize` "simulated runtimes"
Method	`generateRLData`	Determine the running lengths for reaching the target values.
Method	`get_data_format`	Undocumented
Method	`info`	print text info to stdout
Method	`info_str`	return print info as string
Method	`instance_index_lists`	return `OrderedDict` of index lists for each instance.
Method	`isBiobjective`	Undocumented
Method	`median_evals`	return median for each row in `self.evals`, unsuccessful runs count.
Method	`mMaxEvals`	Returns the maximum number of function evaluations over all runs (trials), obsolete and replaced by attribute `max_eval`
Method	`nbRuns`	Returns the number of runs depending on `genericsettings.balance_instances`.
Method	`pickle`	Save this instance to a pickle file.
Method	`plot`	plot all data from `evals` attribute and the median.
Method	`plot_funvals`	plot data of `funvals` attribute, versatile
Method	`splitByTrials`	Splits the post-processed data arrays by trials.
Method	`successes_by_instance`	return `OrderedDict` with number of successes for each instance
Instance Variable	`algs`	Undocumented
Instance Variable	`comment`	Undocumented
Instance Variable	`dataFiles`	Undocumented
Instance Variable	`funvals`	Undocumented
Instance Variable	`indexFiles`	Undocumented
Instance Variable	`instancenumbers`	Undocumented
Instance Variable	`isFinalized`	Undocumented
Instance Variable	`pickleFile`	Undocumented
Instance Variable	`readfinalFminusFtarget`	Undocumented
Instance Variable	`readmaxevals`	maxevals as read from the info files
Instance Variable	`reference_values`	Undocumented
Instance Variable	`success_ratio`	Undocumented
Property	`budget_effective_estimates`	return `OrderedDict` of `sum(maxevals) / max(1, #successes)`
Property	`ert`	expected runtimes for the targets in `target`.
Property	`evals`	`evals` contains the central data, number of evaluations.
Property	`evals_appended`	Is this abandoned?
Property	`evals_are_appended`	return `True` if `self.evals_appended` consist of appended trials (same instances are appended)
Property	`instance_multipliers`	number of repetitions per instance to balance a skewed instance distribution.
Property	`instancenumbers_balanced`	return instancenumbers extended with balancing_instancenumbers
Property	`max_eval`	maximum number of function evaluations over all runs (trials),
Property	`maxevals`	maxevals per instance data, i.e. the columns of `evals[:, 1:]`.
Property	`maxfgevals`	maximum of the weighted f+g sum per instance.
Property	`nbRuns_raw`	Undocumented
Property	`number_of_constraints`	number of constraints of the function/problem the `DataSet` is based upon.
Property	`suite_name`	Returns a string, with the name of the DataSet's underlying test suite.
Property	`target`	target values (`np.array`) corresponding to `ert` (which all have finite values)
Property	`trial_count_by_instance`	return `Counter` `dict` with number of trials (actually) done for each instance
Static Method	`_largest_finite_index`	return `i` such that `isfinite(ar[i]) and not isfinite(ar[i+1])`,
Method	`__parseHeader`	Extract data from a header line in an index entry.
Method	`_argsort`	return index array for a sorted order of trials.
Method	`_balanced_evals_row`	append evaluations to `evals_row` to achieve a balanced instance distribution.
Method	`_complement_data`	insert a line for each target value, never used (detEvals(targets) does the job on the fly)
Method	`_cut_data`	attributes `target`, `evals`, and `ert` are truncated to target values not much smaller than defined in attribute `precision` (typically `1e-8`). Attribute `maxevals` is recomputed for columns that reach the final target precision...
Method	`_data_differ`	return a list of targets for which `ds` differs from `self`
Method	`_detEvals2`	Determine the number of evaluations to reach target values.
Method	`_detMaxEvals`	computes for each data column of _evals the (maximal) evaluation until final_target was reached, or `self.maxevals` otherwise.
Method	`_evals_appended_compute`	create evals-array with appended instances.
Method	`_evals_with_simulated_restarts`	return simulated runtimes for each 1D-array in `evals_list`.
Method	`_number_of_better_runs`	return the number of `self.evals(target)` that are smaller
Method	`_old_plot`	plot data from `evals` attribute.
Method	`_update_evals_balanced`	update attribute `_evals_balanced` if necessary.
Method	`_WIP_number_of_better_runs`	return the number of `self.evals([target])` that are better
Class Variable	`_attributes`	Undocumented
Instance Variable	`_ert`	Undocumented
Instance Variable	`_ert_nb_of_data`	Undocumented
Instance Variable	`_evals`	`_evals` are the central data and later accessed via the `evals` property. Each line `_evals[i]` has a (target) function value in `_evals[i][0]` and the function evaluation for which this target was reached the first time in trials 1,...
Instance Variable	`_evals_appended`	Undocumented
Instance Variable	`_evals_balanced`	Undocumented
Instance Variable	`_evals_balanced_raw_data_columns`	Undocumented
Instance Variable	`_extra_attr`	Undocumented
Instance Variable	`_lasttdatfilelines`	Undocumented
Instance Variable	`_maxevals`	Undocumented
Instance Variable	`_maxevals_appended`	Undocumented
Instance Variable	`_target`	Undocumented
Property	`_budget_estimates`	return `OrderedDict` of sum(maxevals) for each (raw data) instance.
Property	`_instance_repetitions`	return the number of runs that repeated a previous instance.
Property	`_need_balancing`	return True of gs.balance_instances and self.instance_multipliers are >1

def __eq__(self, other): ¶

overridden in cocopp.bestalg.BestAlgSet

Compare indexEntry instances.

def __init__(self, header, comment, data, indexfile): ¶

overridden in cocopp.algportfolio.DataSet, cocopp.bestalg.BestAlgSet

Instantiate a DataSet.

The first three input arguments correspond to three consecutive lines of an index file (.info extension).

Parameters
header	Undocumented
comment	Undocumented
data	Undocumented
indexfile	Undocumented
string header	information of the experiment
string comment	more information on the experiment
string data	information on the runs of the experiment
string indexfile	string for the file name from where the information come

def __ne__(self, other): ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

def __repr__(self): ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

def bootstrap_sample_size(self, sample_size=genericsettings.simulated_runlength_bootstrap_sample_size): ¶

return minimum size not smaller than sample_size such that modulo self.nbRuns() == 0

def computeERTfromEvals(self): ¶

Sets the attributes ert and target from the attribute evals.

def consistency_check(self): ¶

checks consistency of data set according to - number of instances - instances used

def createDictInstance(self): ¶

overridden in cocopp.bestalg.BestAlgSet

Returns a dictionary of the instances.

The key is the instance Id, the value is a list of index.

def createDictInstanceCount(self): ¶

Returns a dictionary of the instances and their count.

The keys are instance id and the values are the number of repetitions of such instance.

def detAverageEvals(self, targets): ¶

Determine the average number of f-evals for each target in targets list.

The average is weighted correcting for imbalanced trial instances.

If a target is not reached within trial itrail, self.maxevals[itrial] contributes to the average.

Equals to sum(evals(target)) / nbruns. If ERT is finite this equals to ERT * psucc == (sum(evals) / ntrials / psucc) * psucc, where ERT, psucc, and evals are a function of target.

Details: this should be the same as the precomputed ert property.

def detERT(self, targets): ¶

overridden in cocopp.bestalg.BestAlgSet

Determine the expected running time (ERT) to reach target values. The value is numpy.inf, if the target was never reached.

Details: uses attribute self.ert.

Parameters
targets	Undocumented
list targets	target function values of interest
Returns
list of expected running times (# f-evals) for the respective targets.

def detEvals(self, targets, copy=True, bootstrap=False, append_instances=False): ¶

overridden in cocopp.bestalg.BestAlgSet

return len(targets) data rows self.evals[i, 1:].

If bootstrap, the "data rows" are len(self.evals[i, 1:]) values drawn with replacement from self.evals[i, 1:]. This may be useful to estimate variances (at some point).

Rows have the closest but not a larger target such that self.evals[i, 0] <= target and self.evals[i - 1, 0] > target, or in the "limit" cases the first data line or a line np.array(self.nbRuns() * [np.nan]).

Makes by default a copy of the data, however this might change in future.

def detEvals_by_instance(self, targets, raw_values=True, **kwargs): ¶

return result of detEvals for each instance individually

in an OrderedDict whos keys are all elements of instancenumbers.

raw_data=True means no instance balancing/repetitions.

See detEvals for further keyword arguments.

def detSuccesses(self, targets, raw_values=False): ¶

return the number of successful runs for each target.

Unless bool(raw_values) is True, the number of runs are for each instance expanded to their least common multiplier if genericsettings.balance_instances, hence the success events are not necessarily independent in this case.

Details: if raw_values is an int, only the first raw_values columns of the data set are used. If raw_values is True, all data without any balancing repetitions are used.

See also detSuccessRates.

def detSuccessRates(self, targets): ¶

return a np.array with the success rate for each target in targets, easiest target first.

If genericsetting.balance_instances, the rate is weighted such that each instance has the same weight independently of how often it was repeated.

def evals_with_simulated_restarts(self, targets, samplesize=None, randintfirst=toolsstats.randint_derandomized, randintrest=np.random.randint, bootstrap=False, instance=None): ¶

Return a len(targets) list of samplesize "simulated runtimes"

with an interface similar to detEvals.

samplesize is by default the smallest multiple of nbRuns() that is not smaller than 15.

bootstrap is passed to detEvals such that the simulated runs use a bootstrapped subset. This will increase the variance from repeated evals_with_simulated_restarts calls. This may become useful to measure dispersion of runtime profiles.

instance, when given, uses only the data from this instance. The default samplesize may not be appropriate in this case.

np.sort(np.concatenate(return_value)) provides the combined sorted ECDF data over all targets which may be plotted with pyplot.step (missing the last step).

Unsuccessful data are represented as np.nan.

Simulated restarts are used for unsuccessful runs. The usage of detEvals or evals_with_simulated_restarts should be largely interchangeable, while the latter has a "success" rate of either 0 or 1.

Details:

For targets where all runs were successful, samplesize=nbRuns() is sufficient (and preferable) if randint is derandomized.
A single successful running length is computed by adding uniformly randomly chosen running lengths until the first time a successful one is chosen. In case of no successful run the result is None.

TODO: if samplesize >> nbRuns and nsuccesses is large, the data representation becomes somewhat inefficient.

TODO: it may be useful to make the samplesize dependent on the number of successes and supply the multipliers max(samplesizes) / samplesizes.

def generateRLData(self, targets): ¶

Determine the running lengths for reaching the target values.

Parameters
targets	Undocumented
list targets	target function values of interest
Returns
dict of arrays, one array for each target. Each array are copied from attribute `evals` of `DataSetList`: first element is a target function value smaller or equal to the element of targets considered and has for other consecutive elements the corresponding number of function evaluations.

def get_data_format(self): ¶

Undocumented

def info(self, targets=None): ¶

print text info to stdout

def info_str(self, targets=None): ¶

return print info as string

def instance_index_lists(self, raw_values=True): ¶

return OrderedDict of index lists for each instance.

raw_values means no instance balancing, otherwise the indices refer to instancenumbers_balanced whos first indices are the same as in instancenumbers.

The index starts with 0 conforming with instancenumbers, maxevals, detEvals and others. However in the evals array, column 0 contains f-values and the instance indices start with 1.

def isBiobjective(self): ¶

Undocumented

def median_evals(self, target_values=None, append_instances=True): ¶

return median for each row in self.evals, unsuccessful runs count.

If target_values is not None compute the median evaluations to reach the given target values.

Return np.nan if the median run was unsuccessful.

If append_instances and self.evals_are_appended, append all instances from the same instance numbers as if the algorithm was restarted. self.evals_are_appended is True if the resulting number of (unique) instances is at least genericsettings.appended_evals_minimal_trials and if testbedsettings.current_testbed.instances_are_uniform.

Details: copies the evals attribute and sets nan to inf in order to get the median with nan values in the sorting.

def mMaxEvals(self): ¶

Returns the maximum number of function evaluations over all runs (trials), obsolete and replaced by attribute max_eval

def nbRuns(self): ¶

Returns the number of runs depending on genericsettings.balance_instances.

def pickle(self, outputdir=None, gzipped=True): ¶

overridden in cocopp.bestalg.BestAlgSet

Save this instance to a pickle file.

Saves this instance to a (by default gzipped) pickle file. If not specified by argument outputdir, the location of the pickle is given by the location of the first index file associated to this instance.

This method will overwrite existing files.

This method is not in use. Instead, get_DataSetList pickles a DataSetList.

def plot(self, plot_function=plt.semilogy, smallest_target=8e-09, median_formats=(('linestyle', '--')), color_map=None, plot_formats=(), **kwargs): ¶

plot all data from evals attribute and the median.

Plotted are Delta f-value vs evaluations. The sort for the color heatmap is based on the final performance.

color_map is a list or generator with self.nbRuns() colors and used as iter(color_map). The maps can be generated with the matplotlib.colors.LinearSegmentedColormap attributes of module matplotlib.cm. Default is brg between 0 and 0.5, like plt.cm.brg(np.linspace(0, 0.5, self.nbRuns())).

**kwargs is updated with plot_formats and passed to plot_function (for convenience).

def plot_funvals(self, **kwargs): ¶

plot data of funvals attribute, versatile

TODO: seems outdated on 19/8/2016 and 05/2019 (would fail as it was: using "isfinite" instead of "np.isfinite" and is not called from anywhere)

def splitByTrials(self, whichdata=None): ¶

Splits the post-processed data arrays by trials.

Parameters
whichdata	Undocumented
string whichdata	either 'evals' or 'funvals' determines the output
Returns
this method returns dictionaries of arrays, the key of the dictionaries being the instance id, the value being a smaller post-processed data array corresponding to the instance Id. If whichdata is 'evals' then the array contains function evaluations (1st column is alignment targets). Else if whichdata is 'funvals' then the output data contains function values (1st column is alignment budgets). Otherwise this method returns a tuple of these two arrays in this order.

def successes_by_instance(self, target=None, raw_values=True): ¶

return OrderedDict with number of successes for each instance

algs = ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

comment = ¶

overridden in cocopp.algportfolio.DataSet, cocopp.bestalg.BestAlgSet

Undocumented

dataFiles = ¶

Undocumented

funvals = ¶

overridden in cocopp.algportfolio.DataSet

Undocumented

indexFiles = ¶

Undocumented

instancenumbers: list = ¶

overridden in cocopp.algportfolio.DataSet

Undocumented

isFinalized: list = ¶

Undocumented

pickleFile = ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

readfinalFminusFtarget: list = ¶

Undocumented

readmaxevals: list = ¶

maxevals as read from the info files

reference_values = ¶

Undocumented

success_ratio = ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

@property
budget_effective_estimates = ¶

return OrderedDict of sum(maxevals) / max(1, #successes)

for each instance. This is similar to the budget of the within-trial restarted algorithm and also equals to the within-instance ERT for the most difficult target self.precision when #successes > 0.

@property
ert = ¶

expected runtimes for the targets in target.

"Expected runtime" here means the average number of function evaluations to reach or surpass the given target for the first time.

Details: The values are (pre-)computed using computeERTfromEvals. Depending on genericsettings.balance_instances, the average is weighted to make up for unbalanced problem instance occurances.

@property
evals = ¶

evals contains the central data, number of evaluations.

evals is a 2D numpy.array or a list of 1D numpy.array s. Each row i, evals[i], provides a (target) function value in evals[i][0] and the function evaluations at which this target was reached for the first time in trial j=1,2,... in evals[i][j]. The corresponding maximum number of evaluations for trial j can be accessed via attribute maxevals[j-1]. A practical (and numerically efficient) assignment is current_evals = evals[i][1:] which makes maxevals structural identical.

Details: portfolio datasets can have rows with different lengths. Otherwise, the number of columns in evals depends on genericsettings.balance_instances. The instance number on which the first len(instancenumbers) trials were conducted are given in the instancenumbers array. Further columns of evals are generated according to instance_multipliers.

@property
evals_appended = ¶

Is this abandoned?

like the evals property-attribute but here instances with the same ID are aggregated (appended).

The aggregation appends trials with the same instance ID in the order of their appearance.

>>> import warnings
>>> import cocopp
>>> _wl, cocopp.genericsettings.warning_level = cocopp.genericsettings.warning_level, 0
>>> print('load data set'); dsl = cocopp.load('b/2009/bay')  # doctest:+ELLIPSIS
load data set...
>>> cocopp.genericsettings.warning_level = _wl
>>> ds = dsl[99]
>>> warnings.filterwarnings('ignore', message='evals_appended is only recently implemented')
>>> ds.evals_are_appended
False
>>> ds.evals is ds.evals_appended
True
>>> cocopp.genericsettings.appended_evals_minimal_trials = 5  # was 6
>>> ds.evals_are_appended
True
>>> ds.evals is ds.evals_appended
False
>>> ds.evals.shape
(14, 16)
>>> ds.evals_appended.shape
(14, 6)

@property
evals_are_appended = ¶

return True if self.evals_appended consist of appended trials (same instances are appended)

@property
instance_multipliers = ¶

number of repetitions per instance to balance a skewed instance distribution.

The purpose is to give the same weight to all instances irrespectively of their repetitions.

@property
instancenumbers_balanced = ¶

return instancenumbers extended with balancing_instancenumbers

@property
max_eval = ¶

maximum number of function evaluations over all runs (trials),

return max(self.maxevals)

@property
maxevals = ¶

maxevals per instance data, i.e. the columns of evals[:, 1:].

For class instances of bestalg.BestAlgSet or algportfolio.DataSet, maxevals is a dictionary with maxevals as values and the source file or folder as key.

@property
maxfgevals = ¶

maximum of the weighted f+g sum per instance.

These weighted evaluation numbers are consistent with the numbers in the evals class attribute, unless the weights have been changed after setting _evals.

The values are based on the last entry of the .tdat files, hence they reflect the very last evaluation by the algorithm if isFinalized, and they are computed using the current genericsettings.weight_evaluations_constraints.

Yet to be implemented: for class instances of bestalg.BestAlgSet or algportfolio.DataSet, maxevals is a dictionary with maxevals as values and the source file or folder as key.

@property
nbRuns_raw = ¶

Undocumented

@property
number_of_constraints = ¶

number of constraints of the function/problem the DataSet is based upon.

Remark: this is never used so far and needs to be implemented in the class testbedsettings.SuiteClass(self.suite_name).

@property
suite_name = ¶

Returns a string, with the name of the DataSet's underlying test suite.

@property
target = ¶

target values (np.array) corresponding to ert (which all have finite values)

@property
trial_count_by_instance = ¶

return Counter dict with number of trials (actually) done for each instance

@staticmethod
def _largest_finite_index(ar): ¶

return i such that isfinite(ar[i]) and not isfinite(ar[i+1]),

or i == -1 if not isfinite(ar[0]).

Somewhat tested, but not in use.

The computation takes O(log(len(ar)) time and starts to become faster than where(isfinited(ar))[0][-1] only for len(ar) > 100.

def __parseHeader(self, header): ¶

Extract data from a header line in an index entry.

def _argsort(self, smallest_target_value=-np.inf): ¶

return index array for a sorted order of trials.

Sorted from best to worst, for unsuccessful runs successively larger target values are queried to determine which is better.

Returned indices range from 1 to self.nbRuns() referring to columns in self.evals.

Target values smaller than smallest_target_value are not considered.

Details: if two runs have the exact same evaluation profile, they are sorted identically, however we could account for final f-values which seems only to make sense for smallest_target_value<=final_target_value.

def _balanced_evals_row(self, evals_row, first_index=0, instance_multipliers=None): ¶

append evaluations to evals_row to achieve a balanced instance distribution.

evals_row can be an integer or must be commensurable to self._evals[i][1:]. first_index is the first index to consider as data in evals_row (like in evals_row = self._evals[i], the first index must be 1).

If self.instance_multipliers is None the return value is evals_row or the numpy view self._evals[evals_row, 1:]. Parameter instance_multipliers only serves to avoid performance side effects from property repeated invokation.

def _complement_data(self, step=10**0.2, final_target=1e-08): ¶

insert a line for each target value, never used (detEvals(targets) does the job on the fly)

def _cut_data(self): ¶

attributes target, evals, and ert are truncated to target values not much smaller than defined in attribute precision (typically 1e-8). Attribute maxevals is recomputed for columns that reach the final target precision. Note that in the bi-objective case the attribute precision does not exist.

def _data_differ(self, ds): ¶

return a list of targets for which ds differs from self

in the leading columns of the _evals attribute.

def _detEvals2(self, targets): ¶

Determine the number of evaluations to reach target values.

Parameters
targets	Undocumented
seq or float targets	target precisions
Returns
list of len(targets) values, each being an array of nbRuns FEs values

def _detMaxEvals(self, final_target=None): ¶

computes for each data column of _evals the (maximal) evaluation until final_target was reached, or self.maxevals otherwise.

def _evals_appended_compute(self): ¶

create evals-array with appended instances.

The evals_appended array mimics independent restarts.

Only append if the number of remaining trials is at least genericsettings.appended_evals_minimal_trials. Hence a standard 2009 dataset which has the instances 3 * [1,2,3,4,5] remains unchanged by default.

Only append if bool(testbedsettings.current_testbed.instances_are_uniform) is True.

def _evals_with_simulated_restarts(self, evals_list, samplesize, randintfirst, randintrest, bootstrap): ¶

return simulated runtimes for each 1D-array in evals_list.

See evals_with_simulated_restarts

def _number_of_better_runs(self, target, ref_eval): ¶

return the number of self.evals(target) that are smaller

(i.e. better) than ref_eval, where equality counts 1/2.

target may be a scalar or an iterable of targets.

def _old_plot(self, **kwargs): ¶

plot data from evals attribute.

**kwargs is passed to matplolib.loglog.

TODO: seems outdated on 19/8/2016 ("np.isfinite" was "isfinite" hence raising an error)

def _update_evals_balanced(self): ¶

update attribute _evals_balanced if necessary.

The first columns of _evals_balanced equal to those of _evals and further columns are added according to instance_multipliers to balance uneven repetitions over different instances.

def _WIP_number_of_better_runs(self, refalg_dataset, target): ¶

return the number of self.evals([target]) that are better

than the min(refalg_dataset.evals([target])), where equality counts 1/2.

TODO: handle the case when evals is nan using of f-values

_attributes = ¶

Undocumented

_ert = ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

_ert_nb_of_data = ¶

Undocumented

_evals = ¶

overridden in cocopp.algportfolio.DataSet, cocopp.bestalg.BestAlgSet

_evals are the central data and later accessed via the evals property. Each line _evals[i] has a (target) function value in _evals[i][0] and the function evaluation for which this target was reached the first time in trials 1,... in _evals[i][1:].

_evals_appended = ¶

Undocumented

_evals_balanced = ¶

Undocumented

_evals_balanced_raw_data_columns = ¶

Undocumented

_extra_attr: list = ¶

Undocumented

_lasttdatfilelines = ¶

Undocumented

_maxevals = ¶

overridden in cocopp.algportfolio.DataSet, cocopp.bestalg.BestAlgSet

Undocumented

_maxevals_appended = ¶

Undocumented

_target = ¶

overridden in cocopp.bestalg.BestAlgSet

Undocumented

@property
_budget_estimates = ¶

return OrderedDict of sum(maxevals) for each (raw data) instance.

This was implemented but never used.

@property
_instance_repetitions = ¶

return the number of runs that repeated a previous instance.

That is, 0 if all instance number ids are unique, and >= 1 otherwise.

@property
_need_balancing = ¶

return True of gs.balance_instances and self.instance_multipliers are >1