cocopp.toolsstats : API documentation

def _has_len(thing):

Undocumented

def fix_data_number(data, ndata=15, last_elements_randomized=True, warn=False):

Obsolete and subject to removal. Use instead np.asarray(data)[randint_derandomized(0, len(data), ndata)] or [data[i] for i in randint_derandomized(0, len(data), ndata)].

return copy of data vector modified to length ndata or data itself.

Assures len(data) == ndata.

>>> from cocopp.toolsstats import fix_data_number
>>> data = [1,2,4]
>>> assert len(fix_data_number(data, 1)) == 1
>>> assert len(fix_data_number(data, 3)) == 3
>>> assert len(fix_data_number(data, 4)) == 4
>>> assert len(fix_data_number(data, 14)) == 14
>>> assert fix_data_number(data, 14)[2] == data[2]

See also data[randint_derandomized(0, len(data), ndata)], which should do pretty much the same, a little more randomized.

Parameters data is a (row)-vector

def sp1(data, maxvalue=np.Inf, issuccessful=None):

sp1(data, maxvalue=Inf, issuccessful=None) computes a mean value over successful entries in data divided by success rate, the so-called SP1

Input:

data -- array contains, e.g., number of function: evaluations to reach the target value
maxvalue -- number, if issuccessful is not provided, data[i]: is defined successful if it is truly smaller than maxvalue
issuccessful -- None or array of same length as data. Entry: i in data is defined successful, if issuccessful[i] is True or non-zero

Returns: (SP1, success_rate, nb_of_successful_entries), where

SP1 is the mean over successful entries in data divided by the success rate. SP1 equals np.Inf when the success rate is zero.

def sp(data, maxvalue=np.Inf, issuccessful=None, allowinf=True):

sp(data, issuccessful=None) computes the sum of the function evaluations over all runs divided by the number of success, the so-called success performance which estimates the average runtime aRT.

Input:

data -- array contains, e.g., number of function: evaluations to reach the target value
maxvalue -- number, if issuccessful is not provided, data[i]: is defined successful if it is truly smaller than maxvalue
issuccessful -- None or array of same length as data. Entry: i in data is defined successful, if issuccessful[i] is True or non-zero
allowinf -- If False, replace inf output (in case of no success): with the sum of function evaluations.

Returns: (SP, success_rate, nb_of_successful_entries), where SP is the sum

of successful entries in data divided by the number of success.

def drawSP_from_dataset(data_set, ftarget, percentiles, samplesize=genericsettings.simulated_runlength_bootstrap_sample_size):

returns (percentiles, all_sampled_values_sorted) of simulated runlengths to reach ftarget based on a DataSet class instance, specifically:

evals = data_set.detEvals([ftarget])[0] # likely to be 15 "data points"
idx_nan = np.isnan(evals)  # nan == did not reach ftarget
return drawSP(evals[~idx_nan], data_set.maxevals[idx_nan], percentiles, samplesize)

The expected value of all_sampled_values_sorted is the average runtime aRT, as obtained by data_set.detERT([ftarget])[0].

def drawSP_from_dataset_new(data_set, ftarget, dummy, samplesize=genericsettings.simulated_runlength_bootstrap_sample_size):

new implementation, old interface (which should also change at some point)

returns (None, evals), that is, no percentiles, only the data=runtimes=evals

def drawSP(runlengths_succ, runlengths_unsucc, percentiles, samplesize=genericsettings.simulated_runlength_bootstrap_sample_size, derandomized=True):

Returns the percentiles of the bootstrapped distribution of 'simulated' running lengths of successful runs.

Input:

runlengths_succ -- array of running lengths of successful runs
runlengths_unsucc -- array of running lengths of unsuccessful

runs

Return:

(percentiles, all_sampled_values_sorted)

Details:

A single successful running length is computed by adding uniformly randomly chosen running lengths until the first time a successful one is chosen. In case of no successful run an exception is raised.

This implementation is depreciated and replaced by simulated_evals. The latter is also depreciated, see DataSet.evals_with_simulated_restarts instead.

See also: simulated_evals.

def randint_derandomized(low, high=None, size=None):

return a numpy array of derandomized random integers.

The interface is the same as for numpy.randint, however the default value for size is high-low and each "random" integer is guarantied to appear exactly once in each chunk of size high-low. (That is, by default a permutation is returned.)

As for numpy.randint, the value range is [low, high-1] or [0, low-1] if high is None.

>>> import numpy as np
>>> from cocopp.toolsstats import randint_derandomized
>>> np.random.seed(1)
>>> list(randint_derandomized(0, 4, 6))
[3, 2, 0, 1, 0, 2]

A typical usecase is indexing of data like:

[data[i] for i in randint_derandomized(0, len(data), ndata)]
# or almost equivalently
np.asarray(data)[randint_derandomized(0, len(data), ndata)]

def _randint_derandomized_generator(low, high=None, size=None):

the generator for randint_derandomized

def simulated_evals(evals, nfails, samplesize=genericsettings.simulated_runlength_bootstrap_sample_size, randint=randint_derandomized):

Obsolete: see DataSet.evals_with_simulated_restarts instead.

Return samplesize "simulated" run lengths (#evaluations), sorted.

Input:

evals -- array of evaluations
nfail -- only the last nfail evaluations come from

unsuccessful runs
randint -- random integer index function of the first simulated run

Return:

all_sampled_runlengths_sorted

Example:

>>> from cocopp import set_seed
>>> from cocopp.toolsstats import simulated_evals
>>> set_seed(4)
>>> evals_succ = [1]  # only one evaluation in the successful trial
>>> evals_unsucc = [2, 4, 2, 6, 100]
>>> simulated_evals(np.hstack([evals_succ, evals_unsucc]),
...                 len(evals_unsucc), 13)  # doctest: +ELLIPSIS
[1, 1, 3, ...

Details:: A single successful running length is computed by adding uniformly randomly chosen running lengths until the first time a successful one is chosen. In case of no successful run an exception is raised.

def draw(data, percentiles, samplesize=1000.0, func=sp1, args=()):

Generates the empirical bootstrap distribution from a sample.

Input:
  - *data* -- a sequence of data values
  - *percentiles* -- a single scalar value or a sequence of
    percentiles to be computed from the bootstrapped distribution.
  - *func* -- function that computes the statistics as
    func(data,*args) or func(data,*args)[0], by default toolsstats.sp1
  - *args* -- arguments to func, the zero-th element of args is
    expected to be a sequence of boolean giving the success status
    of the associated data value. This specialization of the draw
    procedure is due to the interface of the performance computation
    methods sp1 and sp.
  - *samplesize* -- number of bootstraps drawn, default is 1e3,
    for more reliable values choose rather 1e4. 
    performance is linear in samplesize, 0.2s for samplesize=1000.

Return:
    (prctiles, all_samplesize_bootstrapped_values_sorted)

Example:
    >> import toolsstats
    >> data = np.random.randn(22)
    >> res = toolsstats.draw(data, (10,50,90), samplesize=1e4)
    >> print(res[0])

.. note::
   NaN-values are also bootstrapped, but disregarded for the 
   calculation of percentiles which can lead to somewhat
   unexpected results.

def prctile(x, arrprctiles, issorted=False, ignore_nan=True):

Computes percentile based on data with linear interpolation

:keyword sequence data: (list, array) of data values
:keyword prctiles: percentiles to be calculated. Values beyond the 
                   interval [0,100] also return the respective
                   extreme value in data.
:type prctiles: scalar or sequence
:keyword issorted: indicate if data is sorted
:Return:
    sequence of percentile values in data according to argument
    prctiles

.. NOTEEE::
    treats np.Inf and -np.Inf, np.NaN and None, the latter are
    simply disregarded

def randint(upper, n):

Undocumented

def ranksum_statistic(N1, N2):

Returns the U test statistic of the rank-sum (Mann-Whitney-Wilcoxon) test.

http://en.wikipedia.org/wiki/Mann%E2%80%93Whitney_U Small sample sizes (direct method).

def zprob(z):

Returns the area under the normal curve 'to the left of' the given z value.

http://www.nmr.mgh.harvard.edu/Neural_Systems_Group/gary/python.html

Thus:

for z<0, zprob(z) = 1-tail probability

for z>0, 1.0-zprob(z) = 1-tail probability

for any z, 2.0*(1.0-zprob(abs(z))) = 2-tail probability

Adapted from z.c in Gary Perlman's |Stat. Can handle multiple dimensions.

Usage: azprob(z) where z is a z-value

def ranksumtest(x, y):

Calculates the rank sum statistics for the two input data sets x and y and returns z and p.

This method returns a slight difference compared to scipy.stats.ranksumtest in the two-tailed p-value. Should be test drived...

Returns: z-value for first data set x and two-tailed p-value

def rankdata(a):

Ranks the data in a, dealing with ties appropriately.

Equal values are assigned a rank that is the average of the ranks that would have been otherwise assigned to all of the values within that set. Ranks begin at 1, not 0.

Example:

In [15]: stats.rankdata([0, 2, 2, 3]) Out[15]: array([ 1. , 2.5, 2.5, 4. ])

Parameters:

a : array This array is first flattened.

Returns:

An array of length equal to the size of a, containing rank scores.

def significancetest(entry0, entry1, targets):

Compute the rank-sum test between two data sets.

For a given target function value, the performances of two algorithms are compared. The result of a significance test is computed on the number of function evaluations for reaching the target or, if not available, the function values for the smallest budget in an unsuccessful trial.

Known bugs: this is not a fair comparison, because the successful trials could be very long.

Parameters	DataSet entry0	-- data set 0
	DataSet entry1	-- data set 1
	list targets	-- list of target function values
Returns	list of (z, p) for each target function values in input argument targets. z and p are values returned by the ranksumtest method.

def significance_all_best_vs_other(datasets, targets, best_alg_idx=None):

returns a list of (z, p) tuples, each is the result for the ranksumtest for the respective target value in targets and the index list of best algorithm.

Parameters	datasets	is a list of DataSet from different algorithms, otherwise on the same function and dimension (which is not necessarily checked)
	targets	is a list of target values,
	best_alg_idx	for each target the best algorithm to be tested against the others

def fastsort(a):

Sort an array and provide the argsort.

Parameters:: a : array
Returns:: (sorted array, indices into the original array)

def sliding_window_data(data, width=2, operator=np.median, number_of_stats=0, only_finite_data=True):

width is an absolute number, the resulting data has the same length as the original data and the window width is between width/2 at the border and width in the middle.

Return (smoothed_data, stats), where stats is a list with elements [index_in_data, 2_10_25_50_75_90_98_percentile_of_window_at_i]

def equals_approximately(a, b, abs=1e-11, rel=1e-11):

Undocumented

def in_approximately(a, list_, abs=1e-11, rel=1e-11):

return True if ``a`` equals approximately any of the elements
in ``list_``, in short

    return any([equals_approximately(a, b) for b in list_])

Function	fix_data_number	Obsolete and subject to removal. Use instead `np.asarray(data)[randint_derandomized(0, len(data), ndata)]` or `[data[i] for i in randint_derandomized(0, len(data), ndata)]`.
Function	sp1	sp1(data, maxvalue=Inf, issuccessful=None) computes a mean value over successful entries in data divided by success rate, the so-called SP1
Function	sp	No summary
Function	drawSP_from_dataset	returns ``(percentiles, all_sampled_values_sorted)`` of simulated runlengths to reach ``ftarget`` based on a ``DataSet`` class instance, specifically::
Function	drawSP_from_dataset_new	new implementation, old interface (which should also change at some point)
Function	drawSP	Returns the percentiles of the bootstrapped distribution of 'simulated' running lengths of successful runs.
Function	randint_derandomized	return a `numpy` array of derandomized random integers.
Function	simulated_evals	Obsolete: see `DataSet.evals_with_simulated_restarts` instead.
Function	draw	Generates the empirical bootstrap distribution from a sample.
Function	prctile	Computes percentile based on data with linear interpolation
Function	randint	Undocumented
Function	ranksum_statistic	Returns the U test statistic of the rank-sum (Mann-Whitney-Wilcoxon) test.
Function	zprob	Returns the area under the normal curve 'to the left of' the given z value.
Function	ranksumtest	Calculates the rank sum statistics for the two input data sets `x` and `y` and returns z and p.
Function	rankdata	Ranks the data in a, dealing with ties appropriately.
Function	significancetest	Compute the rank-sum test between two data sets.
Function	significance_all_best_vs_other
Function	fastsort	Sort an array and provide the argsort.
Function	sliding_window_data	width is an absolute number, the resulting data has the same length as the original data and the window width is between width/2 at the border and width in the middle.
Function	equals_approximately	Undocumented
Function	in_approximately	return True if `a` equals approximately any of the elements in `list_`, in short
Class	Evals	Undocumented
Function	_has_len	Undocumented
Function	_randint_derandomized_generator	the generator for `randint_derandomized`

cocopp.toolsstats module documentation

`cocopp.toolsstats` module documentation