class DataSet(object):
Known subclasses: cocopp.algportfolio.DataSet
, cocopp.bestalg.BestAlgSet
Constructor: DataSet(header, comment, data, indexfile)
Unit element for the COCO post-processing.
An instance of this class is created from one unit element of
experimental data. One unit element would correspond to data for a
given algorithm (a given algId
and a comment
line) and a given function and dimension (funcId
Class attributes:
- funcId -- function Id (integer)
- dim -- dimension (integer)
- indexFiles -- associated index files (list of strings)
- dataFiles -- associated data files (list of strings)
- comment -- comment for the setting (string)
- targetFuncValue -- final target function value (float), might be missing
- precision -- final ftarget - fopt (float), data with
- target[idat] < precision are optional and not relevant.
- algId -- algorithm name (string)
- evals -- data aligned by function values (2xarray, list of data rows [f_val, eval_run1, eval_run2,...]); caveat: in a portfolio, data rows can have different lengths
- funvals -- data aligned by function evaluations (2xarray)
- maxevals -- maximum number of function evaluations (array)
- maxfgevals -- maximum (i.e. last) weighted sum of evaluations+constraints_evals per instance (array)
- finalfunvals -- final function values (array)
- readmaxevals -- maximum number of function evaluations read
- from index file (array)
- readfinalFminusFtarget -- final function values - ftarget read
- from index file (array)
- pickleFile -- associated pickle file name (string)
- target -- == evals[:, 0], target function values attained (array)
- suite_name -- name of the test suite like "bbob" or "bbob-biobj"
- ert -- ert for reaching the target values in target (array)
- instancenumbers -- list of numbers corresponding to the instances of
- the test function considered (list of int)
- isFinalized -- list of bool for if runs were properly finalized
and funvals
are arrays of data collected
from N
data sets.
Both have the same format: zero-th column is the value on which the
data of a row is aligned, the N
subsequent columns are
either the numbers of function evaluations for evals
function values for funvals
A short example:
>>> from __future__ import print_function >>> import sys >>> import os >>> import urllib >>> import tarfile >>> import cocopp >>> cocopp.genericsettings.verbose = False # ensure to make doctests work >>> def setup(infoFile): ... if not os.path.exists(infoFile): ... filename = cocopp.archives.bbob.get_one('2009/BIPOP-CMA-ES_hansen') ... >>> infoFile = os.path.join(cocopp.archives.bbob.local_data_path, 'BIPOP-CMA-ES', '') >>> print('get'); setup(infoFile) # doctest:+ELLIPSIS get... >>> dslist = cocopp.load(infoFile) Data consistent according to consistency_check() in pproc.DataSet >>> print(dslist) # doctest:+ELLIPSIS [DataSet(BIPOP-CMA-ES on f2 2-D), ..., DataSet(BIPOP-CMA-ES on f2 40-D)] >>> type(dslist) <class 'cocopp.pproc.DataSetList'> >>> len(dslist) 6 >>> ds = dslist[3] # a single data set of type DataSet >>> ds DataSet(BIPOP-CMA-ES on f2 10-D) >>> for d in dir(ds): print(d) # doctest:+ELLIPSIS _DataSet__parseHeader ... algId algs bootstrap_sample_size budget_effective_estimates comment ... dim ert evals evals_appended evals_are_appended evals_with_simulated_restarts finalfunvals funcId funvals ... info info_str instance_index_lists instance_multipliers instancenumbers isBiobjective isFinalized mMaxEvals max_eval maxevals maxfgevals median_evals nbRuns nbRuns_raw number_of_constraints pickle plot plot_funvals precision readfinalFminusFtarget readmaxevals reference_values splitByTrials success_ratio successes_by_instance suite_name target trial_count_by_instance >>> all(ds.evals[:, 0] == # first column of ds.evals is the "target" f-value True >>> # investigate row 0,10,20,... and of the result columns 0,5,6, index 0 is ftarget >>> ev = ds.evals[0::10, (0,5,6)] # doctest:+ELLIPSIS >>> assert 3.98107170e+07 <= ev[0][0] <= 3.98107171e+07 >>> assert ev[0][1] == 1 >>> assert ev[0][2] == 1 >>> assert 6.07000000e+03 <= ev[-1][-1] <= 6.07000001e+03 >>> # show last row, same columns >>> ev = ds.evals[-1,(0,5,6)] # doctest:+ELLIPSIS >>> assert ev[0] == 1e-8 >>> assert 5.67600000e+03 <= ev[1] <= 5.67600001e+03 >>> # prints similar data more nicely formated Algorithm: BIPOP-CMA-ES Function ID: 2 Dimension DIM = 10 Number of trials: 15 Final target Df: 1e-08 min / max number of evals per trial: 5676 / 6346 evals/DIM: best 15% 50% 85% max | ERT/DIM nsucc ---Df---|-----------------------------------------|---------------- 1.0e+03 | 102 126 170 205 235 | 164.2 15 1.0e+01 | 278 306 364 457 480 | 374.5 15 1.0e-01 | 402 445 497 522 536 | 490.8 15 1.0e-03 | 480 516 529 554 567 | 532.8 15 1.0e-05 | 513 546 563 584 593 | 562.5 15 1.0e-08 | 568 594 611 628 635 | 609.6 15>>> import numpy as np >>> idx = list(range(0, 50, 10)) + [-1] >>> # get ERT (expected running time) for some targets >>> t = np.array([idx,[idx], ds.ert[idx]]).T # doctest:+ELLIPSIS >>> assert t[0][0] == 0 >>> assert t[0][2] == 1 >>> assert t[-1][-2] == 1e-8 >>> assert 6.09626666e+03 <= t[-1][-1] <= 6.09626667e+03Note that the load of a data set depends on the set of instances specified in testbedsettings' TestBed class (or its children) (None means all instances are read in): >>> import sys >>> import os >>> import urllib >>> import tarfile >>> import cocopp >>> cocopp.genericsettings.verbose = False # ensure to make doctests work >>> infoFile = os.path.join(cocopp.archives.bbob.local_data_path, 'BIPOP-CMA-ES', '') >>> if not os.path.exists(infoFile): ... filename = cocopp.archives.bbob.get_one('bbob/2009/BIPOP-CMA-ES_hansen') ... >>> dslist = cocopp.load(infoFile)
Data consistent according to consistency_check() in pproc.DataSet>>> dslist[2].instancenumbers [1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5] >>> dslist[2].evals[-1] # doctest:+ELLIPSIS array([... >>> assert (dslist[2].evals[-1])[0] == 1.0e-8 >>> assert 2.01200000e+03 <= (dslist[2].evals[-1])[-1] <= 2.01200001e+03 >>> # because testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] was None >>> cocopp.testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] = [1, 3] >>> cocopp.config.config('bbob') # make sure that settings are used >>> dslist2 = cocopp.load(infoFile) Data consistent according to consistency_check() in pproc.DataSet >>> dslist2[2].instancenumbers [1, 1, 1, 3, 3, 3] >>> dslist2[2].evals[-1] # doctest:+ELLIPSIS array([... >>> assert (dslist2[2].evals[-1])[0] == 1.0e-8 >>> assert 2.20700000e+03 <= (dslist2[2].evals[-1])[-1] <= 2.20700001e+03 >>> # set things back to cause no troubles elsewhere: >>> cocopp.testbedsettings.GECCOBBOBTestbed.settings['instancesOfInterest'] = None >>> cocopp.config.config('bbob') # make sure that settings are used
Method | __eq__ |
Compare indexEntry instances. |
Method | __init__ |
Instantiate a DataSet. |
Method | __ne__ |
Undocumented |
Method | __repr__ |
Undocumented |
Method | bootstrap |
return minimum size not smaller than sample_size such that modulo self.nbRuns() == 0 |
Method | compute |
Sets the attributes ert and target from the attribute evals. |
Method | consistency |
checks consistency of data set according to - number of instances - instances used |
Method | create |
Returns a dictionary of the instances. |
Method | create |
Returns a dictionary of the instances and their count. |
Method | det |
Determine the average number of f-evals for each target in targets list. |
Method | det |
Determine the expected running time (ERT) to reach target values. The value is numpy.inf, if the target was never reached. |
Method | det |
return len(targets) data rows self.evals[i, 1:]. |
Method | det |
return result of detEvals for each instance individually |
Method | det |
return the number of successful runs for each target. |
Method | det |
return a np.array with the success rate for each target in targets, easiest target first. |
Method | evals |
Return a len(targets) list of samplesize "simulated runtimes" |
Method | generate |
Determine the running lengths for reaching the target values. |
Method | get |
Undocumented |
Method | info |
print text info to stdout |
Method | info |
return print info as string |
Method | instance |
return OrderedDict of index lists for each instance. |
Method | is |
Undocumented |
Method | median |
return median for each row in self.evals , unsuccessful runs count. |
Method | m |
Returns the maximum number of function evaluations over all runs (trials), obsolete and replaced by attribute max_eval |
Method | nb |
Returns the number of runs depending on genericsettings.balance_instances . |
Method | pickle |
Save this instance to a pickle file. |
Method | plot |
plot all data from evals attribute and the median. |
Method | plot |
plot data of funvals attribute, versatile |
Method | split |
Splits the post-processed data arrays by trials. |
Method | successes |
return OrderedDict with number of successes for each instance |
Instance Variable | algs |
Undocumented |
Instance Variable | comment |
Undocumented |
Instance Variable | data |
Undocumented |
Instance Variable | funvals |
Undocumented |
Instance Variable | index |
Undocumented |
Instance Variable | instancenumbers |
Undocumented |
Instance Variable | is |
Undocumented |
Instance Variable | pickle |
Undocumented |
Instance Variable | readfinal |
Undocumented |
Instance Variable | readmaxevals |
maxevals as read from the info files |
Instance Variable | reference |
Undocumented |
Instance Variable | success |
Undocumented |
Property | budget |
return OrderedDict of sum(maxevals) / max(1, #successes) |
Property | ert |
expected runtimes for the targets in target . |
Property | evals |
evals contains the central data, number of evaluations. |
Property | evals |
Is this abandoned? |
Property | evals |
return True if self.evals_appended consist of appended trials (same instances are appended) |
Property | instance |
number of repetitions per instance to balance a skewed instance distribution. |
Property | instancenumbers |
return instancenumbers extended with balancing_instancenumbers |
Property | max |
maximum number of function evaluations over all runs (trials), |
Property | maxevals |
maxevals per instance data, i.e. the columns of evals[:, 1:]. |
Property | maxfgevals |
maximum of the weighted f+g sum per instance. |
Property | nb |
Undocumented |
Property | number |
number of constraints of the function/problem the DataSet is based upon. |
Property | suite |
Returns a string, with the name of the DataSet's underlying test suite. |
Property | target |
target values (np.array ) corresponding to ert (which all have finite values) |
Property | trial |
return Counter dict with number of trials (actually) done for each instance |
Static Method | _largest |
return i such that isfinite(ar[i]) and not isfinite(ar[i+1]), |
Method | __parse |
Extract data from a header line in an index entry. |
Method | _argsort |
return index array for a sorted order of trials. |
Method | _balanced |
append evaluations to evals_row to achieve a balanced instance distribution. |
Method | _complement |
insert a line for each target value, never used (detEvals(targets) does the job on the fly) |
Method | _cut |
attributes target , evals , and ert are truncated to target values not much smaller than defined in attribute precision (typically 1e-8). Attribute maxevals is recomputed for columns that reach the final target precision... |
Method | _data |
return a list of targets for which ds differs from self |
Method | _det |
Determine the number of evaluations to reach target values. |
Method | _det |
computes for each data column of _evals the (maximal) evaluation until final_target was reached, or self.maxevals otherwise. |
Method | _evals |
create evals-array with appended instances. |
Method | _evals |
return simulated runtimes for each 1D-array in evals_list . |
Method | _number |
return the number of self.evals(target) that are smaller |
Method | _old |
plot data from evals attribute. |
Method | _update |
update attribute _evals_balanced if necessary. |
Method | _ |
return the number of self.evals([target]) that are better |
Class Variable | _attributes |
Undocumented |
Instance Variable | _ert |
Undocumented |
Instance Variable | _ert |
Undocumented |
Instance Variable | _evals |
_evals are the central data and later accessed via the evals property. Each line _evals[i] has a (target) function value in _evals[i][0] and the function evaluation for which this target was reached the first time in trials 1,... |
Instance Variable | _evals |
Undocumented |
Instance Variable | _evals |
Undocumented |
Instance Variable | _evals |
Undocumented |
Instance Variable | _extra |
Undocumented |
Instance Variable | _lasttdatfilelines |
Undocumented |
Instance Variable | _maxevals |
Undocumented |
Instance Variable | _maxevals |
Undocumented |
Instance Variable | _target |
Undocumented |
Property | _budget |
return OrderedDict of sum(maxevals) for each (raw data) instance. |
Property | _instance |
return the number of runs that repeated a previous instance. |
Property | _need |
return True of gs.balance_instances and self.instance_multipliers are >1 |
, cocopp.bestalg.BestAlgSet
Instantiate a DataSet.
The first three input arguments correspond to three consecutive lines of an index file (.info extension).
Parameters | |
header | Undocumented |
comment | Undocumented |
data | Undocumented |
indexfile | Undocumented |
string header | information of the experiment |
string comment | more information on the experiment |
string data | information on the runs of the experiment |
string indexfile | string for the file name from where the information come |
return minimum size not smaller than sample_size
such that modulo self.nbRuns() == 0
Returns a dictionary of the instances.
The key is the instance Id, the value is a list of index.
Returns a dictionary of the instances and their count.
The keys are instance id and the values are the number of repetitions of such instance.
Determine the average number of f-evals for each target in targets list.
The average is weighted correcting for imbalanced trial instances.
If a target is not reached within trial itrail, self.maxevals[itrial] contributes to the average.
Equals to sum(evals(target)) / nbruns. If ERT is finite this equals to ERT * psucc == (sum(evals) / ntrials / psucc) * psucc, where ERT, psucc, and evals are a function of target.
Details: this should be the same as the precomputed ert
Determine the expected running time (ERT) to reach target values. The value is numpy.inf, if the target was never reached.
Details: uses attribute self.ert.
Parameters | |
targets | Undocumented |
list targets | target function values of interest |
Returns | |
list of expected running times (# f-evals) for the respective targets. |
return len(targets) data rows self.evals[i, 1:].
If bootstrap
, the "data rows" are len(self.evals[i, 1:])
values drawn with replacement from self.evals[i, 1:]. This may
be useful to estimate variances (at some point).
Rows have the closest but not a larger target such that self.evals[i, 0] <= target and self.evals[i - 1, 0] > target, or in the "limit" cases the first data line or a line np.array(self.nbRuns() * [np.nan]).
Makes by default a copy of the data, however this might change in future.
return result of detEvals
for each instance individually
in an OrderedDict
whos keys are all elements of instancenumbers
raw_data=True means no instance balancing/repetitions.
See detEvals
for further keyword arguments.
return the number of successful runs for each target.
Unless bool(raw_values) is True, the number of runs are for each
instance expanded to their least common multiplier if
, hence the success events are
not necessarily independent in this case.
Details: if raw_values
is an int
, only the first raw_values
columns of the data set are used. If raw_values is True, all
data without any balancing repetitions are used.
See also detSuccessRates
return a np.array with the success rate for each target in targets, easiest target first.
If genericsetting.balance_instances
, the rate is weighted such
that each instance has the same weight independently of how often
it was repeated.
Return a len(targets) list
of samplesize "simulated runtimes"
with an interface similar to detEvals
is by default the smallest multiple of nbRuns()
that is not smaller than 15.
is passed to detEvals
such that the simulated runs
use a bootstrapped subset. This will increase the variance from
repeated evals_with_simulated_restarts
calls. This may become
useful to measure dispersion of runtime distributions.
, when given, uses only the data from this instance. The
default samplesize
may not be appropriate in this case.
np.sort(np.concatenate(return_value)) provides the combined
sorted ECDF data over all targets which may be plotted with
(missing the last step).
Unsuccessful data are represented as np.nan
Simulated restarts are used for unsuccessful runs. The usage of
or evals_with_simulated_restarts
should be largely
interchangeable, while the latter has a "success" rate of either
0 or 1.
- For targets where all runs were successful, samplesize=nbRuns()
is sufficient (and preferable) if
is derandomized. - A single successful running length is computed by adding
uniformly randomly chosen running lengths until the first time a
successful one is chosen. In case of no successful run the
result is
TODO: if samplesize
>> nbRuns
and nsuccesses is large,
the data representation becomes somewhat inefficient.
TODO: it may be useful to make the samplesize dependent on the number of successes and supply the multipliers max(samplesizes) / samplesizes.
Determine the running lengths for reaching the target values.
Parameters | |
targets | Undocumented |
list targets | target function values of interest |
Returns | |
dict of arrays, one array for each target. Each array
are copied from attribute evals of
DataSetList : first element is a target
function value smaller or equal to the element of
targets considered and has for other consecutive
elements the corresponding number of function
evaluations. |
return OrderedDict
of index lists for each instance.
means no instance balancing, otherwise the indices
refer to instancenumbers_balanced
whos first indices are the same
as in instancenumbers
The index starts with 0 conforming with instancenumbers
, detEvals
and others. However in the evals
array, column 0 contains f-values and the instance indices start
with 1.
return median for each row in self.evals
, unsuccessful runs count.
If target_values is not None compute the median evaluations to reach the given target values.
Return np.nan
if the median run was unsuccessful.
If append_instances and self.evals_are_appended, append all
instances from the same instance numbers as if the algorithm was
restarted. self.evals_are_appended is True if the resulting
number of (unique) instances is at least
and if
Details: copies the evals attribute and sets nan
to inf
order to get the median with nan
values in the sorting.
Returns the maximum number of function evaluations over all runs (trials),
obsolete and replaced by attribute max_eval
Save this instance to a pickle file.
Saves this instance to a (by default gzipped) pickle file. If not specified by argument outputdir, the location of the pickle is given by the location of the first index file associated to this instance.
This method will overwrite existing files.
plot all data from evals
attribute and the median.
Plotted are Delta f-value vs evaluations. The sort for the color heatmap is based on the final performance.
is a list
or generator
with self.nbRuns()
and used as iter(color_map). The maps can be generated with the
attributes of module
. Default is brg
between 0 and 0.5, like, 0.5, self.nbRuns())).
is updated with plot_formats
and passed to
(for convenience).
plot data of funvals
attribute, versatile
- TODO: seems outdated on 19/8/2016 and 05/2019 (would fail as it was
- using "isfinite" instead of "np.isfinite" and is not called from anywhere)
Splits the post-processed data arrays by trials.
Parameters | |
whichdata | Undocumented |
string whichdata | either 'evals' or 'funvals' determines the output |
Returns | |
this method returns dictionaries of arrays, the key of the dictionaries being the instance id, the value being a smaller post-processed data array corresponding to the instance Id. If whichdata is 'evals' then the array contains function evaluations (1st column is alignment targets). Else if whichdata is 'funvals' then the output data contains function values (1st column is alignment budgets). Otherwise this method returns a tuple of these two arrays in this order. |
return OrderedDict
of sum(maxevals) / max(1, #successes)
for each instance. This is similar to the budget of the within-trial restarted algorithm and also equals to the within-instance ERT for the most difficult target self.precision when #successes > 0.
expected runtimes for the targets in target
"Expected runtime" here means the average number of function evaluations to reach or surpass the given target for the first time.
Details: The values are (pre-)computed using computeERTfromEvals
Depending on genericsettings.balance_instances
, the average is
weighted to make up for unbalanced problem instance occurances.
evals contains the central data, number of evaluations.
is a 2D numpy.array
or a list of 1D numpy.array
Each row i, evals[i], provides a (target) function value in
evals[i][0] and the function evaluations at which this target
was reached for the first time in trial j=1,2,... in
evals[i][j]. The corresponding maximum number of evaluations
for trial j can be accessed via attribute maxevals[j-1]. A
practical (and numerically efficient) assignment is current_evals
= evals[i][1:] which makes maxevals
structural identical.
Details: portfolio datasets can have rows with different lengths.
Otherwise, the number of columns in evals depends on
. The instance number on which
the first len(instancenumbers) trials were conducted are given
in the instancenumbers
array. Further columns of evals
generated according to instance_multipliers
Is this abandoned?
like the evals
property-attribute but here instances with the same ID
are aggregated (appended).
The aggregation appends trials with the same instance ID in the order of their appearance.
>>> import warnings >>> import cocopp >>> _wl, cocopp.genericsettings.warning_level = cocopp.genericsettings.warning_level, 0 >>> print('load data set'); dsl = cocopp.load('b/2009/bay') # doctest:+ELLIPSIS load data set... >>> cocopp.genericsettings.warning_level = _wl >>> ds = dsl[99] >>> warnings.filterwarnings('ignore', message='evals_appended is only recently implemented') >>> ds.evals_are_appended False >>> ds.evals is ds.evals_appended True >>> cocopp.genericsettings.appended_evals_minimal_trials = 5 # was 6 >>> ds.evals_are_appended True >>> ds.evals is ds.evals_appended False >>> ds.evals.shape (14, 16) >>> ds.evals_appended.shape (14, 6)
number of repetitions per instance to balance a skewed instance distribution.
The purpose is to give the same weight to all instances irrespectively of their repetitions.
maxevals per instance data, i.e. the columns of evals[:, 1:].
For class instances of bestalg.BestAlgSet
or algportfolio.DataSet
is a dictionary with maxevals as values and the source
file or folder as key.
maximum of the weighted f+g sum per instance.
These weighted evaluation numbers are consistent with the
numbers in the evals
class attribute, unless the weights
have been changed after setting _evals
The values are based on the last entry of the .tdat
files, hence
they reflect the very last evaluation by the algorithm if
, and they are computed using the current
Yet to be implemented: for class instances of bestalg.BestAlgSet
or algportfolio.DataSet
, maxevals
is a dictionary with maxevals
as values and the source file or folder as key.
number of constraints of the function/problem the DataSet
is based upon.
Remark: this is never used so far and needs to be implemented in the class testbedsettings.SuiteClass(self.suite_name).
return i
such that isfinite(ar[i]) and not isfinite(ar[i+1]),
or i == -1 if not isfinite(ar[0]).
Somewhat tested, but not in use.
The computation takes O(log(len(ar)) time and starts to become faster than where(isfinited(ar))[0][-1] only for len(ar) > 100.
return index array for a sorted order of trials.
Sorted from best to worst, for unsuccessful runs successively larger target values are queried to determine which is better.
Returned indices range from 1 to self.nbRuns() referring to columns in self.evals.
Target values smaller than smallest_target_value are not considered.
Details: if two runs have the exact same evaluation profile, they are sorted identically, however we could account for final f-values which seems only to make sense for smallest_target_value<=final_target_value.
append evaluations to evals_row
to achieve a balanced instance distribution.
can be an integer or must be commensurable to
self._evals[i][1:]. first_index
is the first index to
consider as data in evals_row
(like in evals_row =
self._evals[i], the first index must be 1).
If self.instance_multipliers is None the return value is
or the numpy view self._evals[evals_row, 1:].
Parameter instance_multipliers
only serves to avoid performance
side effects from property repeated invokation.
Determine the number of evaluations to reach target values.
Parameters | |
targets | Undocumented |
seq or float targets | target precisions |
Returns | |
list of len(targets) values, each being an array of nbRuns FEs values |
computes for each data column of _evals the (maximal) evaluation until final_target was reached, or self.maxevals otherwise.
create evals-array with appended instances.
The evals_appended
array mimics independent restarts.
Only append if the number of remaining trials is at least
. Hence a standard
2009 dataset which has the instances 3 * [1,2,3,4,5] remains
unchanged by default.
Only append if bool(testbedsettings.current_testbed.instances_are_uniform) is True.
return simulated runtimes for each 1D-array in evals_list
return the number of self.evals(target) that are smaller
(i.e. better) than ref_eval, where equality counts 1/2.
may be a scalar or an iterable of targets.
plot data from evals
is passed to matplolib.loglog
TODO: seems outdated on 19/8/2016 ("np.isfinite" was "isfinite" hence raising an error)
update attribute _evals_balanced
if necessary.
The first columns of _evals_balanced
equal to those of _evals
and further columns are added according to instance_multipliers
balance uneven repetitions over different instances.
return the number of self.evals([target]) that are better
than the min(refalg_dataset.evals([target])), where equality counts 1/2.
TODO: handle the case when evals is nan using of f-values
, cocopp.bestalg.BestAlgSet
_evals are the central data and later accessed via the evals
property. Each line _evals[i] has a (target) function value
in _evals[i][0] and the function evaluation for which this
target was reached the first time in trials 1,... in
return OrderedDict
of sum(maxevals) for each (raw data) instance.
This was implemented but never used.
return the number of runs that repeated a previous instance.
That is, 0 if all instance number ids are unique, and >= 1 otherwise.