Function definitions

Introduction

This is an online-friendly presentation of the bbob functions, copied from the BBOB 2009 function document (Finck et al. 2009). You may cite this work in a scientific context as

Steffen Finck, Nikolaus Hansen, Raymond Ros, and Anne Auger. Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions. Technical report, RR-6829. INRIA, 2009. https://inria.hal.science/inria-00362633v2/document

@techreport{bbob2019,
    author = {Finck, Steffen and Hansen, Nikolaus and Ros, Raymond and Auger, Anne},
    title = {Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions},
    institution = {INRIA},
    year = {2009},
    number = {RR-6829},
    note = {Updated version as of February 2019},
    url = {https://inria.hal.science/inria-00362633v2/document}
}

In the following, 24 noise-free real-parameter single-objective benchmark functions are presented1. Our intention behind the selection of benchmark functions was to evaluate the performance of algorithms with regard to typical difficulties which we believe occur in continuous domain search. We hope that the function collection reflects, at least to a certain extend and with a few exceptions, a more difficult portion of the problem distribution that will be seen in practice (easy functions are evidently of lesser interest).

We prefer benchmark functions that are comprehensible such that algorithm behaviours can be understood in the topological context. In this way, a desired search behaviour can be pictured and deficiencies of algorithms can be profoundly analysed. Last but not least, this can eventually lead to a systematic improvement of algorithms.

All benchmark functions are scalable with the dimension. Most functions have no specific value of their optimal solution (they are randomly shifted in x-space). All functions have an artificially chosen optimal function value (they are randomly shifted in f-space). Consequently, for each function different instances can be generated: for each instance the randomly chosen values are drawn anew2. Apart from the first subgroup, the benchmarks are non-separable. Other specific properties are discussed in Function Properties.

General setup

Search Space:

All functions are defined and can be evaluated over \mathcal{R}^{D}, while the actual search domain is given as [-5,5]^{D}.

Location of the optimal \mathbf{x}^\mathrm{opt} and of f_\mathrm{opt}=f(\mathbf{x^\mathrm{opt}}):

All functions have their global optimum in [-5,5]^{D}. The majority of functions has the global optimum in [-4,4]^{D} and for many of them \mathbf{x}^\mathrm{opt} is drawn uniformly from this compact. The value for f_\mathrm{opt} is drawn from a Cauchy distributed random variable, with zero median and with roughly 50% of the values between -100 and 100. The value is rounded after two decimal places and set to \pm1000 if its absolute value exceeds 1000. In the function definitions a transformed variable vector \mathbf{z} is often used instead of the argument \mathbf{x}. The vector \mathbf{z} has its optimum in \mathbf{z^\mathrm{opt}}=\mathbf{0}, if not stated otherwise.

Boundary Handling:

On some functions a penalty boundary handling is applied as given with f^{{}}_\mathrm{pen} (see Symbols and definitions).

Linear Transformations:

Linear transformations of the search space are applied to derive non-separable functions from separable ones and to control the conditioning of the function.

Non-Linear Transformations and Symmetry Breaking:

In order to make relatively simple, but well understood functions less regular, on some functions non-linear transformations are applied in x- or f-space. Both transformations T_\mathrm{\hspace*{-0.01emosz}}:\mathcal{R}^n\to\mathcal{R}^n, n\in\{1,D\}, and T_\mathrm{asy}:\mathcal{R}^D\to\mathcal{R}^D are defined coordinate-wise (see Symbols and definitions). They are smooth and have, coordinate-wise, a strictly positive derivative. They are shown in Figure 1. T_\mathrm{osz} is oscillating about the identity, where the oscillation is scale invariant w.r.t. the origin. T^{{}}_\mathrm{asy} is the identity for negative values. When T^{{}}_\mathrm{asy} is applied, a portion of 1/2^D of the search space remains untransformed.

Figure 1: T_\mathrm{osz} (blue) and D-th coordinate of T_\mathrm{asy} for \beta = 0.1, 0.2, 0.5 (green)

Symbols and definitions

Used symbols and definitions of, e.g., auxiliary functions are given in the following. Vectors are typeset in bold and refer to column vectors.

\otimes indicates element-wise multiplication of two D-dimensional vectors, \otimes:\mathcal{R}^D\times\mathcal{R}^D\to\mathcal{R}^D, (\mathbf{x},\mathbf{y})\mapsto\mathrm{{diag}}(\mathbf{x})\times\mathbf{y}=(x_i\times y_i)_{i=1,\dots,D}

\|.\| denotes the Euclidean norm, \|\mathbf{x}\|^2=\sum_i x_i^2.

[.] denotes the nearest integer value

\mathbf{0} =(0,\dots,0)^{\mathrm{T}} all zero vector

\mathbf{1} =(1,\dots,1)^{\mathrm{T}} all one vector

\Lambda^{\!\alpha} is a diagonal matrix in D dimensions with the ith diagonal element as \lambda_{ii} = \alpha^{\frac{1}{2}\frac{i-1}{D-1}}, for i=1,\dots,D.

f^{{}}_\mathrm{pen} :\mathcal{R}^D\to\mathcal{R}, \mathbf{x}\mapsto\sum_{i=1}^D\max(0,|x_i| - 5)^2

\mathbf{1}_-^+ a D-dimensional vector with entries of -1 or 1 with equal probability independently drawn.

\mathbf{Q}, \mathbf{R} orthogonal (rotation) matrices. For one function in one dimension a different realization for respectively \mathbf{Q} and \mathbf{R} is used for each instantiation of the function. Orthogonal matrices are generated from standard normally distributed entries by Gram-Schmidt orthonormalization. Columns and rows of an orthogonal matrix form an orthonormal basis.

\mathbf{R} see \mathbf{Q}

T^{{\beta}}_\mathrm{asy} :\mathcal{R}^D\to\mathcal{R}^D, x_i\mapsto \begin{cases} x_i^{1+\beta \frac{i-1}{D-1}\sqrt{x_i}} & \text{~if~} x_i>0\\ x_i & \text{~otherwise~} \end{cases}, for i=1,\dots,D. See Figure 1.

T_\mathrm{\hspace*{-0.01em}osz} :\mathcal{R}^n\to\mathcal{R}^n, for any positive integer n (n=1 and n=D are used in the following), maps element-wise x\mapsto\mathrm{{sign}}(x)\exp\left(\hat{x} + 0.049\left(\sin(c_1 \hat{x}) + \sin(c_2 \hat{x})\right)\right) with \hat{x}= \begin{cases} \log(|x|) & \text{if~} x\not=0 \\ 0 & \text{otherwise} \end{cases}, \mathrm{{sign}}(x) = \begin{cases} -1 & \text{if~} x < 0 \\ 0 & \text{if~} x = 0 \\ 1 & \text{otherwise} \end{cases}, c_1 = \begin{cases} 10 & \text{if~} x > 0\\ 5.5 & \text{otherwise} \end{cases} and c_2 = \begin{cases} 7.9 & \text{if~} x > 0\\ 3.1 & \text{otherwise} \end{cases}. See Figure 1.

\mathbf{x}^\mathrm{opt} optimal solution vector, such that f(\mathbf{x^\mathrm{opt}}) is minimal.

Separable functions

f1: Sphere function

f_{1}(\mathbf{x}) = \|\mathbf{z}\|^2 + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{x}- \mathbf{x^\mathrm{opt}}

Properties:

Presumably the most easy continuous domain search problem, given the volume of the searched solution is small (i.e. where pure monte-carlo random search is too expensive).

  • unimodal

  • highly symmetric, in particular rotationally invariant, scale invariant

Information gained from this function:

  • What is the optimal convergence rate of an algorithm?

f2: Ellipsoidal function

f_{2}(\mathbf{x}) = \sum_{i = 1}^{D} 10^{6\frac{i-1}{D-1}}\,z_i^2 + f_\mathrm{opt}\\

  • \mathbf{z}= T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{x}- \mathbf{x^\mathrm{opt}})

Properties:

Globally quadratic and ill-conditioned function with smooth local irregularities.

  • unimodal

  • conditioning is about 10^6

Information gained from this function:

  • In comparison to f10: Is separability exploited?

f3: Rastrigin function

f_{3}(\mathbf{x}) = 10 \left(D- \sum_{i = 1}^{D}\cos(2\pi z_i)\right) + \|\mathbf{z}\|^2 + f_\mathrm{opt}

  • \mathbf{z}= \Lambda^{\!10}T^{{0.2}}_{\mathrm{asy}}(T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{x}-\mathbf{x^\mathrm{opt}}))

Properties:

Highly multimodal function with a comparatively regular structure for the placement of the optima. The transformations T^{{}}_\mathrm{asy} and T_\mathrm{\hspace*{-0.01em}osz} alleviate the symmetry and regularity of the original Rastrigin function

  • roughly 10^D local optima

  • conditioning is about 10

Information gained from this function:

  • In comparison to f15: Is separability exploited?

f4: Büche-Rastrigin function

f_{4}(\mathbf{x}) = 10 \left(D- \sum_{i = 1}^{D}\cos(2\pi z_i)\right) + \sum_{i = 1}^{D}z_i^2 + 100\,f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • z_i = s_i\,T_\mathrm{\hspace*{-0.01emosz}}(x_i - x_i^\mathrm{opt}) \quad \text{for} \; i = 1 \ldots D

  • s_i = \begin{cases} 10\times10^{\frac{1}{2}\,\frac{i-1}{D-1}} & \text{~if~} z_i>0 \text{~and~} i = 1,3,5,\ldots \\ 10^{\frac{1}{2}\frac{i-1}{D-1}} & \text{~otherwise~} \end{cases} for i= 1,\dots,D

Properties:

Highly multimodal function with a structured but highly asymmetric placement of the optima. Constructed as a deceptive function for symmetrically distributed search operators.

  • roughly 10^D local optima, conditioning is about 10, skew factor is about 10 in x-space and 100 in f-space

Information gained from this function:

  • In comparison to f3: What is the effect of asymmetry?

f5: Linear slope

f_{5}(\mathbf{x}) = \sum_{i = 1}^{D} 5\,|s_i| - s_i z_i + f_\mathrm{opt}

  • z_i = x_i if x_i^\mathrm{opt}x_i < 5^2 and z_i = x_i^\mathrm{opt} otherwise, for i=1,\dots,D. That is, if x_i exceeds x_i^\mathrm{opt} it will mapped back into the domain and the function appears to be constant in this direction.

  • s_i = \mathrm{{sign}}\left(x_i^\mathrm{opt}\right)\, 10^{\frac{i-1}{D-1}} for i=1,\dots,D.

  • \mathbf{x^\mathrm{opt}}= \mathbf{z^\mathrm{opt}}= 5\times\mathbf{1_-^+}

Properties:

Purely linear function testing whether the search can go outside the initial convex hull of solutions right into the domain boundary.

  • \mathbf{x}^\mathrm{opt} is on the domain boundary

Information gained from this function:

  • Can the search go outside the initial convex hull of solutions into the domain boundary? Can the step size be increased accordingly?

Functions with low or moderate conditioning

f6: Attractive sector function

f_{6}(\mathbf{x}) = T_\mathrm{\hspace*{-0.01emosz}}\left(\sum_{i = 1}^{D} (s_i z_i)^2\right)^{0.9} + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{Q}\Lambda^{\!10}\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}})

  • s_i = \begin{cases} 10^2& \text{if~} z_i\times x_i^\mathrm{opt}> 0\\ 1 & \text{otherwise} \end{cases}

Properties:

Highly asymmetric function, where only one “hypercone” (with angular base area) with a volume of roughly {1}/{2^D} yields low function values. The optimum is located at the tip of this cone.

  • unimodal

Information gained from this function:

  • In comparison to f1: What is the effect of a highly asymmetric landscape?

f7: Step ellipsoidal function

f_{7}(\mathbf{x}) = 0.1 \max\left(|\hat{z}_1|/10^4,\, \sum_{i = 1}^{D} 10^{2\frac{i-1}{D-1}} z_i^2\right) + f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • \hat{\mathbf{z}} = \Lambda^{\!10}\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}})

  • \hat{z_i} = \begin{cases} \lfloor0.5+\hat{z}_i\rfloor & \text{if~} \left|\hat{z}_i\right| > 0.5 \\ {\lfloor0.5+10\,\hat{z}_i\rfloor}/{10} & \text{otherwise} \end{cases} for i=1,\dots,D,
    denotes the rounding procedure in order to produce the plateaus.

  • \mathbf{z}= \mathbf{Q}\tilde{\mathbf{z}}

Properties:

The function consists of many plateaus of different sizes. Apart from a small area close to the global optimum, the gradient is zero almost everywhere.

  • unimodal, non-separable, conditioning is about 100

Information gained from this function:

  • Does the search get stuck on plateaus?

f8: Rosenbrock function, original

f_{8}(\mathbf{x}) = \sum_{i = 1}^{D-1} \left( 100\,\left(z_i^2 - z_{i+1}\right)^2 + (z_i-1)^2 \right) + f_\mathrm{opt}

  • \mathbf{z}= \max\!\left(1,\frac{\sqrt{D}}{8}\right)(\mathbf{x}- \mathbf{x^\mathrm{opt}}) + 1

  • \mathbf{z^\mathrm{opt}}=\mathbf{1}

Properties:

So-called banana function due to its 2-D contour lines as a bent ridge (or valley) (Rosenbrock 1960). In the beginning, the prominent first term of the function definition attracts to the point \mathbf{z}=\mathbf{0}. Then, a long bending valley needs to be followed to reach the global optimum. The ridge changes its orientation D-1 times. Exceptionally, here \mathbf{x^\mathrm{opt}}\in[-3,3]^D.

  • tri-band dependency structure, in larger dimensions the function has a local optimum with an attraction volume of about 25%

Information gained from this function:

  • Can the search follow a long path with D-1 changes in the direction?

f9: Rosenbrock function, rotated

f_{9}(\mathbf{x}) = \sum_{i = 1}^{D-1} \left( 100\,\left(z_i^2 - z_{i+1}\right)^2 + (z_i-1)^2 \right) + f_\mathrm{opt}

  • \mathbf{z}= \max\left(1,\frac{\sqrt{D}}{8}\right)\mathbf{R}\mathbf{x}+ \mathbf{1}/2

  • \mathbf{z^\mathrm{opt}}=\mathbf{1}

Properties:

rotated version of the previously defined Rosenbrock function.

Information gained from this function:

  • In comparison to f8: Can the search follow a long path with D-1 changes in the direction without exploiting partial separability?

Functions with high conditioning and unimodal

f10: Ellipsoidal function

f_{10}(\mathbf{x}) = \sum_{i = 1}^{D} 10^{6\frac{i-1}{D-1}}z_i^2 + f_\mathrm{opt}

  • \mathbf{z}= T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}}))

Properties:

Globally quadratic ill-conditioned function with smooth local irregularities, non-separable counterpart to f_2.

  • unimodal, conditioning is 10^6

Information gained from this function:

  • In comparison to f2: What is the effect of rotation (non-separability)?

f11: Discus function

f_{11}(\mathbf{x}) = 10^6 z_1^2 + \sum_{i = 2}^{D} z_i^2 + f_\mathrm{opt}

  • \mathbf{z}= T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}}))

Properties:

Globally quadratic function with local irregularities. A single direction in search space is a thousand times more sensitive than all others.

  • conditioning is about 10^6

Information gained from this function:

  • In comparison to f1: What is the effect of constraint-like penalization?

f12: Bent cigar function

f_{12}(\mathbf{x}) = z_1^2 + 10^6\sum_{i = 2}^{D} z_i^2 + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{R}\,T^{{0.5}}_\mathrm{asy}(\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}}))

Properties:

A ridge defined as \sum_{i=2}^{D} z_i^2 =0 needs to be followed. The ridge is smooth but very narrow. Due to T^{{1/2}}_\mathrm{asy} the overall shape deviates remarkably from being quadratic.

  • conditioning is about 10^6, rotated, unimodal

Information gained from this function:

  • Can the search continuously change its search direction?

f13: Sharp ridge function

f_{13}(\mathbf{x}) = z_1^2 + 100\,\sqrt{\sum_{i = 2}^{D} z_i^2} + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{Q}\Lambda^{\!10}\mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}})

Properties:

As for the Bent Cigar function, a ridge defined as \sum_{i=2}^D z_i^2 = 0 must be followed. The ridge is non-differentiable and the gradient is constant when the ridge is approached from any given point. Following the gradient becomes ineffective close to the ridge where the ridge needs to be followed in z_1-direction to its optimum. The necessary change in “search behavior” close to the ridge is difficult to diagnose, because the gradient towards the ridge does not flatten out.
Information gained from this function:

  • In comparison to f12: What is the effect of non-smoothness, non-differentiabale ridge?

f14: Different powers function

f_{14}(\mathbf{x}) = \sqrt{\sum_{i = 1}^{D}|z_i|^{2+4\frac{i - 1}{D- 1}}} + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{R}(\mathbf{x}- \mathbf{x^\mathrm{opt}})

Properties:

Due to the different exponents the sensitivies of the z_i-variables become more and more different when approaching the optimum.

Multi-modal functions with adequate global structure

f15: Rastrigin function

f_{15}(\mathbf{x}) = 10 \left(D- \sum_{i = 1}^{D}\cos(2\pi z_i)\right) + \|\mathbf{z}\|^2 + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{R}\Lambda^{\!10}\mathbf{Q}\,T^{{0.2}}_\mathrm{asy}(T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{R}(\mathbf{x}-\mathbf{x^\mathrm{opt}})))

Properties:

Prototypical highly multimodal function which has originally a very regular and symmetric structure for the placement of the optima. The transformations T^{{}}_\mathrm{asy} and T_\mathrm{\hspace*{-0.01em}osz} alleviate the symmetry and regularity of the original Rastrigin function.

  • non-separable less regular counterpart of f_3

  • roughly 10^D local optima

  • conditioning is about 10

Information gained from this function:

  • in comparison to f3: What is the effect of non-separability for a highly multimodal function?

f16: Weierstrass function

f_{16}(\mathbf{x}) = 10 \left( \frac{1}{D} \sum_{i = 1}^{D}\sum_{k = 0}^{11} 1/2^k \cos(2\pi3^k(z_i + 1/2)) - f_0 \right)^3 + \frac{10}{D}f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • \mathbf{z}= \mathbf{R}\Lambda^{\!1/100}\mathbf{Q}\,T_\mathrm{\hspace*{-0.01emosz}}(\mathbf{R}(\mathbf{x}-\mathbf{x^\mathrm{opt}}))

  • f_0 = \sum_{k = 0}^{11} 1/2^k \cos(2\pi3^k 1/2)

Properties:

Highly rugged and moderately repetitive landscape, where the global optimum is not unique.

  • the term \sum_k 1/2^k \cos(2\pi3^k\dots) introduces the ruggedness, where lower frequencies have a larger weight 1/2^k.

  • rotated, locally irregular, non-unique global optimum

Information gained from this function:

  • in comparison to f17: Does ruggedness or a repetitive landscape deter the search behavior?

f17: Schaffers F7 function

f_{17}(\mathbf{x}) = \left(\frac{1}{D- 1}\sum_{i = 1}^{D- 1} \sqrt{s_i} + \sqrt{s_i} \sin^2\!\left(50\,s_i^{1/5}\right)\right)^2 + 10\,f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • \mathbf{z}= \Lambda^{\!10}\mathbf{Q}\,T^{{0.5}}_\mathrm{asy}(\mathbf{R}(\mathbf{x}-\mathbf{x^\mathrm{opt}}))

  • s_i = \sqrt{z_i^2 + z_{i+1}^2} for i=1,\dots,D

Properties:

A highly multimodal function where frequency and amplitude of the modulation vary.

  • asymmetric, rotated

  • conditioning is low

f18: Schaffers F7 function, moderately ill-conditioned

f_{18}(\mathbf{x}) = \left(\frac{1}{D- 1}\sum_{i = 1}^{D- 1} \sqrt{s_i} + \sqrt{s_i} \sin^2\!\left(50\,s_i^{1/5}\right)\right)^2 + 10\,f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • \mathbf{z}= \Lambda^{\!1000}\mathbf{Q}\,T^{{0.5}}_\mathrm{asy}(\mathbf{R}(\mathbf{x}-\mathbf{x^\mathrm{opt}}))

  • s_i = \sqrt{z_i^2 + z_{i+1}^2} for i=1,\dots,D

Properties:

Moderately ill-conditioned counterpart to f_{17}

  • conditioning of about 1000

Information gained from this function:

  • In comparison to f17: What is the effect of ill-conditioning?

f19: Composite Griewank-Rosenbrock function F8F2

f_{19}(\mathbf{x}) = \frac{10}{D-1} \sum_{i=1}^{D-1} \left( \frac{s_i}{4000} - \cos(s_i) \right) + 10 + f_\mathrm{opt}

  • \mathbf{z}= \max\!\left(1,\frac{\sqrt{D}}{8}\right)\mathbf{R}\mathbf{x}+ 0.5

  • s_i = 100\,(z_i^2 - z_{i+1})^2 + (z_i-1)^2 for i=1,\dots,D

  • \mathbf{z^\mathrm{opt}}=\mathbf{1}

Properties:

Resembling the Rosenbrock function in a highly multimodal way.

Multi-modal functions with weak global structure

f20: Schwefel function

f_{20}(\mathbf{x}) = - \frac{1}{100D}% % kept in the final print \sum_{i = 1}^{D}z_i\sin(\sqrt{|z_i|}) + 4.189828872724339 + 100f_{\mathrm{pen}}(\mathbf{z}/100) + f_\mathrm{opt}

  • \hat{\mathbf{x}} = 2\times\mathbf{1_-^+}\otimes\mathbf{x}

  • \hat{z}_{1} = \hat{x}_{1},\quad \hat{z}_{i+1} = \hat{x}_{i+1} + 0.25 \left({\hat{x}_{i}} - 2|x_i^{\text{opt}}| \right),\quad \text{for } i = 1, \ldots, D - 1

  • \mathbf{z}= 100 (\Lambda^{10} (\hat{\mathbf{z}} - 2\left|\mathbf{x^\mathrm{opt}}\right|) + 2\left|\mathbf{x^\mathrm{opt}}\right|)

  • \mathbf{x^\mathrm{opt}}= 4.2096874633/2 \;\mathbf{1_-^+}, where \mathbf{1}_-^+ is the same realization as above

Properties:

The most prominent 2^D minima are located comparatively close to the corners of the unpenalized search area (Schwefel 1981).

  • the penalization is essential, as otherwise more and better minima occur further away from the search space origin

f21: Gallagher’s Gaussian 101-me peaks function

f_{21}(\mathbf{x}) = T_\mathrm{\hspace*{-0.01emosz}}\left( 10 - \max_{i=1}^{101} w_i \exp\left(-\frac{1}{2D}\, (\mathbf{x}-\mathbf{y}_i)^{\mathrm{T}}\mathbf{R}^{\mathrm{T}} \mathbf{C}_i \mathbf{R}(\mathbf{x}-\mathbf{y}_i) \right) \right)^2 + f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • w_i = \begin{cases} 1.1 + 8 \times\dfrac{i-2}{99} & \text{for~} i=2,\dots,101 \\ 10 & \text{for~} i = 1 \end{cases}, three optima have a value larger than 9

  • \mathbf{C}_i=\Lambda^{\!\alpha_i}/\alpha_i^{1/4} where \Lambda^{\!\alpha_i} is defined as usual (see Symbols and definitions),

  • but with randomly permuted diagonal elements. For i=2,\dots,101, \alpha_i is drawn uniformly randomly from the set \left\{1000^{2\frac{j}{99}} ~|~ j = 0,\dots,99\right\} without replacement, and \alpha_i=1000 for i=1.

  • the local optima \mathbf{y}_i are uniformly drawn from the domain [-5,5]^D for i=2,\dots,101 and \mathbf{y}_{1}\in[-4,4]^D. The global optimum is at \mathbf{x^\mathrm{opt}}=\mathbf{y}_1.

Properties:

The function consists of 101 optima with position and height being unrelated and randomly chosen (different for each instantiation of the function), based on (Gallagher and Yuan 2006).

  • the conditioning around the global optimum is about 30

Information gained from this function:

  • Is the search effective without any global structure?

f22: Gallagher’s Gaussian 21-hi peaks function

f_{22}(\mathbf{x}) = T_\mathrm{\hspace*{-0.01emosz}}\left( 10 - \max_{i=1}^{21} w_i \exp\left(-\frac{1}{2D}\, (\mathbf{x}-\mathbf{y}_i)^{\mathrm{T}}\mathbf{R}^{\mathrm{T}} \mathbf{C}_i \mathbf{R}(\mathbf{x}-\mathbf{y}_i) \right) \right)^2 + f_{\mathrm{pen}}(\mathbf{x}) + f_\mathrm{opt}

  • w_i = \begin{cases} 1.1 + 8 \times\dfrac{i-2}{19} & \text{for~} i=2,\dots,21 \\ 10 & \text{for~} i = 1 \end{cases}, two optima have a value larger than 9

  • \mathbf{C}_i=\Lambda^{\!\alpha_i}/\alpha_i^{1/4} where \Lambda^{\!\alpha_i} is defined as usual (see Symbols and definitions),

  • but with randomly permuted diagonal elements. For i=2,\dots,21, \alpha_i is drawn uniformly randomly from the set \left\{1000^{2\frac{j}{19}} ~|~ j = 0,\dots,19\right\} without replacement, and \alpha_i=1000^2 for i=1.

  • the local optima \mathbf{y}_i are uniformly drawn from the domain [-4.9,4.9]^D for i=2,\dots,21 and \mathbf{y}_{1}\in [-3.92,3.92]^D. The global optimum is at \mathbf{x^\mathrm{opt}}=\mathbf{y}_1.

Properties:

The function consists of 21 optima with position and height being unrelated and randomly chosen (different for each instantiation of the function), based on (Gallagher and Yuan 2006).

  • the conditioning around the global optimum is about 1000

Information gained from this function:

  • In comparison to f21: What is the effect of higher condition?

f23: Katsuura function

f_{23}(\mathbf{x}) = \frac{10}{D^{2}} \prod_{i=1}^D\left(1 + i\sum_{j=1}^{32} \frac{\left|2^j z_i - [2^j z_i]\right|}{2^j} \right)^{10/D^{1.2}} - \frac{10}{D^{2}} + f_{\mathrm{pen}}(\mathbf{x}) + f_{\mathrm{opt}}

  • \mathbf{z}= \mathbf{Q}\,\Lambda^{\!100}\mathbf{R}(\mathbf{x}-\mathbf{x^\mathrm{opt}})

Properties:

Highly rugged and highly repetitive function with more than 10^D global optima, based on the idea in (Katsuura 1991)

f24: Lunacek bi-Rastrigin function

f_{24}(\mathbf{x}) = \mathrm{min}\left(\sum_{i = 1}^{D}(\hat{x}_i - \mu_0)^2, d\,D+ s\sum_{i = 1}^{D}(\hat{x}_i - \mu_1)^2\right) + 10\left(D- \sum_{i=1}^D\cos(2\pi z_i)\right) + 10^4\,f^{{}_\mathrm{pen}}(\mathbf{x}) + f_{\mathrm{opt}}

  • \hat{\mathbf{x}} = 2\,\mathrm{{sign}}(\mathbf{x^\mathrm{opt}})\otimes\mathbf{x}, \mathbf{x^\mathrm{opt}}= \frac{\mu_0}{2} \mathbf{1_-^+}

  • \mathbf{z}= \mathbf{Q}\Lambda^{\!100}\mathbf{R}(\hat{\mathbf{x}}-\mu_0\,\mathbf{1})

  • \mu_0 = 2.5, \mu_1 = -\sqrt{\dfrac{\mu_0^2-d}{s}}, s = 1 - \dfrac{1}{2\sqrt{D+20}-8.2}, d=1

Properties:

Highly multimodal function based on (Lunacek, Whitley, and Sutton 2008) with two funnels around \frac{\mu_0}{2}\mathbf{1_-^+} and \frac{\mu_1}{2}\mathbf{1_-^+} being superimposed by the cosine. Presumably different approaches need to be used for “selecting the funnel” and for searching the highly multimodal function “within” the funnel. The function was constructed to be deceptive for evolutionary algorithms with large population size.

  • the funnel of the local optimum at \frac{\mu_1}{2}\mathbf{1_-^+} has roughly 70\% of the search space volume within [-5,5]^D.

Information gained from this function: Can the search behavior be local on the global scale but global on a local scale?

Function properties

Deceptive Functions

All “deceptive” functions provide, beyond their deceptivity, a “structure” that can be exploited to solve them in a reasonable procedure.

Ill-Conditioning

Ill-conditioning is a typical challenge in real-parameter optimization and, besides multimodality, probably the most common one. Conditioning of a function can be rigorously formalized in the case of convex quadratic functions, f({\mathbf{x}}) =\frac12 {\mathbf{x}}^{T} {\mathbf{H}}{\mathbf{x}} where {\mathbf{H}} is a symmetric definite positive matrix, as the condition number of the Hessian matrix {\mathbf{H}}. Since contour lines associated to a convex quadratic function are ellipsoids, the condition number corresponds to the square root of the ratio between the largest axis of the ellipsoid and the shortest axis. For more general functions, conditioning loosely refers to the square of the ratio between the largest direction and smallest of a contour line. The testbed contains ill-conditioned functions with a typical conditioning of 10^6. We believe this a realistic requirement, while we have seen practical problems with conditioning as large as 10^{10}.

Regularity

Functions from simple formulas are often highly regular. We have used a non-linear transformation, T_\mathrm{\hspace*{-0.01em}osz}, in order to introduce small, smooth but clearly visible irregularities. Furthermore, the testbed contains a few highly irregular functions.

Separability

In general, separable functions pose an essentially different search problem to solve, because the search process can be reduced to D one-dimensional search procedures. Consequently, non-separable problems must be considered much more difficult and most benchmark functions are designed being non-separable. The typical well-established technique to generate non-separable functions from separable ones is the application of a rotation matrix \mathcal{R}.

Symmetry

Stochastic search procedures often rely on Gaussian distributions to generate new solutions and it has been argued that symmetric benchmark functions could be in favor of these operators. To avoid a bias in favor of highly symmetric operators we have used a symmetry breaking transformation, T^{{}}_\mathrm{asy}. We have also included some highly asymmetric functions.

Target function value to reach

The typical target function value for all functions is {f_\mathrm{opt}}+{10^{-8}}. On many functions a value of {f_\mathrm{opt}}+1 is not very difficult to reach, but the difficulty versus function value is not uniform for all functions. These properties are not intrinsic, that is {f_\mathrm{opt}}+{10^{-8}} is not intrinsically “very good”. The value mainly reflects a scalar multiplier in the function definition.

References

Finck, Steffen, Nikolaus Hansen, Raymond Ros, and Anne Auger. 2009. “Real-Parameter Black-Box Optimization Benchmarking 2009: Noiseless Functions Definitions.” RR-6829. INRIA. https://inria.hal.science/inria-00362633v2/document.
Gallagher, Marcus, and Bo Yuan. 2006. “A General-Purpose Tunable Landscape Generator.” IEEE Transactions on Evolutionary Computation 10 (5): 590–603. https://doi.org/10.1109/TEVC.2005.863628.
Hansen, Nikolaus, Anne Auger, Dimo Brockhoff, and Tea and Tušar. 2022. “Anytime Performance Assessment in Blackbox Optimization Benchmarking.” IEEE Transactions on Evolutionary Computation 26 (6): 1293–1305. https://doi.org/10.1109/TEVC.2022.3210897.
Hansen, Nikolaus, Anne Auger, Steffen Finck, and Raymond Ros. 2009. “Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup.” RR-6828. INRIA. https://inria.hal.science/inria-00362649v2/document.
Hansen, Nikolaus, Tea Tušar, Olaf Mersmann, Anne Auger, and Dimo Brockhoff. 2016. COCO: The Experimental Procedure.” arXiv Preprint arXiv:1603.08776. https://arxiv.org/abs/1603.08776.
Katsuura, Hidefumi. 1991. “Continuous Nowhere-Differentiable Functions – an Application of Contraction Mappings.” The American Mathematical Monthly 98 (5): 411–16. https://doi.org/10.1080/00029890.1991.12000778.
Lunacek, Monte, Darrell Whitley, and Andrew Sutton. 2008. “The Impact of Global Structure on Search.” In Proceedings of the International Conference on Parallel Problem Solving from Nature (PPSN X), 5199:498–507. Lecture Notes in Computer Science. Springer. https://doi.org/10.1007/978-3-540-87700-4\_50.
Rosenbrock, H. H. 1960. “An Automatic Method for Finding the Greatest or Least Value of a Function.” The Computer Journal 3 (3): 175–84. https://doi.org/10.1093/comjnl/3.3.175.
Schwefel, Hans-Paul. 1981. Numerical Optimization of Computer Models. John Wiley & Sons, Inc.

Footnotes

  1. For our experimental setup see (Hansen et al. 2016, 2009) and for our performance assessment methodology see (Hansen et al. 2022).↩︎

  2. The implementation provides an instance ID as input, such that a set of uniquely specified instances can be explicitly chosen.↩︎