.. title:: COCO: The Experimental Procedure $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ COCO: The Experimental Procedure $$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$$ .. the next two lines are necessary in LaTeX. They will be automatically replaced to put away the \chapter level as ^^^ and let the "current" level become \section. .. CHAPTERTITLE .. CHAPTERUNDERLINE .. | .. | .. .. sectnum:: :depth: 3 .. .. contents:: Table of Contents .. | .. | .. raw:: html See also: ArXiv e-prints, arXiv:1603.08776, 2016. .. raw:: latex % \tableofcontents is automatic with sphinx and moved behind abstract by swap...py \begin{abstract} We present a budget-free experimental setup and procedure for benchmarking numerical optimization algorithms in a black-box scenario. This procedure can be applied with the COCO_ benchmarking platform. We describe initialization of and input to the algorithm and touch upon the relevance of termination and restarts. .. We finally reconsider parameter tuning and the concept of recommendations for benchmarking with COCO_. .. raw:: latex \end{abstract} \newpage .. _2009: http://www.sigevo.org/gecco-2009/workshops.html#bbob .. _2010: http://www.sigevo.org/gecco-2010/workshops.html#bbob .. _2012: http://www.sigevo.org/gecco-2012/workshops.html#bbob .. _BBOB-2009: https://numbbo.github.io/ppdata-archive/bbob/2009/ .. _BBOB-2010: https://numbbo.github.io/ppdata-archive/bbob/2010/ .. _BBOB-2012: https://numbbo.github.io/ppdata-archive/bbob/2012/ .. _GECCO-2012: http://www.sigevo.org/gecco-2012/ .. _COCO: https://github.com/numbbo/coco .. _COCOold: https://web.archive.org/web/20200812230823/https://coco.gforge.inria.fr/ .. |f| replace:: :math:`f` .. |coco_problem_get_dimension| replace:: ``coco_problem_get_dimension`` .. _coco_problem_get_dimension: http://numbbo.github.io/coco-doc/C/coco_8h.html#a0dabf3e4f5630d08077530a1341f13ab .. |coco_problem_get_largest_values_of_interest| replace:: ``coco_problem_get_largest_values_of_interest`` .. _coco_problem_get_largest_values_of_interest: http://numbbo.github.io/coco-doc/C/coco_8h.html#a29c89e039494ae8b4f8e520cba1eb154 .. |coco_problem_get_smallest_values_of_interest| replace:: ``coco_problem_get_smallest_values_of_interest`` .. _coco_problem_get_smallest_values_of_interest: http://numbbo.github.io/coco-doc/C/coco_8h.html#a4ea6c067adfa866b0179329fe9b7c458 .. |coco_problem_get_initial_solution| replace:: ``coco_problem_get_initial_solution`` .. _coco_problem_get_initial_solution: http://numbbo.github.io/coco-doc/C/coco_8h.html#ac5a44845acfadd7c5cccb9900a566b32 .. |coco_problem_final_target_hit| replace:: ``coco_problem_final_target_hit`` .. _coco_problem_final_target_hit: http://numbbo.github.io/coco-doc/C/coco_8h.html#a1164d85fd641ca48046b943344ae9069 .. |coco_problem_get_number_of_objectives| replace:: ``coco_problem_get_number_of_objectives`` .. _coco_problem_get_number_of_objectives: http://numbbo.github.io/coco-doc/C/coco_8h.html#ab0d1fcc7f592c283f1e67cde2afeb60a .. |coco_problem_get_number_of_constraints| replace:: ``coco_problem_get_number_of_constraints`` .. _coco_problem_get_number_of_constraints: http://numbbo.github.io/coco-doc/C/coco_8h.html#ad5c7b0889170a105671a14c8383fbb22 .. |coco_evaluate_function| replace:: ``coco_evaluate_function`` .. _coco_evaluate_function: http://numbbo.github.io/coco-doc/C/coco_8h.html#aabbc02b57084ab069c37e1c27426b95c .. |coco_evaluate_constraint| replace:: ``coco_evaluate_constraint`` .. _coco_evaluate_constraint: http://numbbo.github.io/coco-doc/C/coco_8h.html#ab5cce904e394349ec1be1bcdc35967fa .. |coco_problem_t| replace:: ``coco_problem_t`` .. _coco_problem_t: http://numbbo.github.io/coco-doc/C/coco_8h.html#a408ba01b98c78bf5be3df36562d99478 .. |coco_recommend_solution| replace:: ``coco_recommend_solution`` .. _coco_recommend_solution: http://numbbo.github.io/coco-doc/C/coco_8h.html#afd76a19eddd49fb78c22563390437df2 .. |coco_problem_get_evaluations(const coco_problem_t * problem)| replace:: ``coco_problem_get_evaluations(const coco_problem_t * problem)`` .. _coco_problem_get_evaluations(const coco_problem_t * problem): http://numbbo.github.io/coco-doc/C/coco_8h.html#a6ad88cdba2ffd15847346d594974067f .. ################################################################################# .. ################################################################################# .. ################################################################################# Introduction ============ Based on [HAN2009]_ and [HAN2010]_, we describe a comparatively simple experimental setup for *black-box optimization benchmarking*. We recommend to use this procedure within the COCO_ platform [HAN2016co]_. [#]_ Our **central measure of performance**, to which the experimental procedure is adapted, is the number of calls to the objective function to reach a certain solution quality (function value or :math:`f`-value or indicator value), also denoted as runtime. Terminology ----------- *function* We talk about an objective *function* |f| as a parametrized mapping :math:`\mathbb{R}^n\to\mathbb{R}^m` with scalable input space, that is, :math:`n` is not (yet) determined, and usually :math:`m\in\{1,2\}`. Functions are parametrized such that different *instances* of the "same" function are available, e.g. translated or shifted versions. *problem* We talk about a *problem*, |coco_problem_t|_, as a specific *function instance* on which the optimization algorithm is run. Specifically, a problem can be described as the triple ``(dimension, function, instance)``. A problem can be evaluated and returns an :math:`f`-value or -vector. In the context of performance assessment, a target :math:`f`- or indicator-value is attached to each problem. That is, a target value is added to the above triple to define a single problem in this case. *runtime* We define *runtime*, or *run-length* [HOO1998]_ as the *number of evaluations* conducted on a given problem, also referred to as number of *function* evaluations. Our central performance measure is the runtime until a given target value is hit [HAN2016perf]_. *suite* A test- or benchmark-suite is a collection of problems, typically between twenty and a hundred, where the number of objectives :math:`m` is fixed. .. compare also the `COCO read me`_. .. _`COCO read me`: https://github.com/numbbo/coco/blob/master/README.md .. [#] The COCO_ platform provides several (single and bi-objective) *test suites* with a collection of black-box optimization problems of different dimensions to be minimized. COCO_ automatically collects the relevant data to display the performance results after a post-processing is applied. Conducting the Experiment ========================= The optimization algorithm to be benchmarked is run on each problem of the given test suite once. On each problem, the very same algorithm with the same parameter setting, the same initialzation procedure, the same budget, the same termination and/or restart criteria etc. is used. There is no prescribed minimal or maximal allowed budget, the benchmarking setup is *budget-free*. The longer the experiment, the more data are available to assess the performance accurately. See also Section :ref:`sec:budget`. .. _sec:input: Initialization and Input to the Algorithm ----------------------------------------- An algorithm can use the following input information from each problem. At any time: *Input and output dimensions* as a defining interface to the problem, specifically: - The search space (input) dimension via |coco_problem_get_dimension|_, - The number of objectives via |coco_problem_get_number_of_objectives|_, which is the "output" dimension of |coco_evaluate_function|_. All functions of a single benchmark suite have the same number of objectives, currently either one or two. - The number of constraints via |coco_problem_get_number_of_constraints|_, which is the "output" dimension of |coco_evaluate_constraint|_. *All* problems of a single benchmark suite have either no constraints, or one or more constraints. *Search domain of interest* defined from |coco_problem_get_largest_values_of_interest|_ and |coco_problem_get_smallest_values_of_interest|_. The optimum (or each extremal solution of the Pareto set) lies within the search domain of interest. If the optimizer operates on a bounded domain only, the domain of interest can be interpreted as lower and upper bounds. *Feasible (initial) solution* provided by |coco_problem_get_initial_solution|_. The initial state of the optimization algorithm and its parameters shall only be based on these input values. The initial algorithm setting is considered as part of the algorithm and must therefore follow the same procedure for all problems of the suite. The problem identifier or the positioning of the problem in the suite or any (other) known characteristics of the problem are not allowed as input to the algorithm, see also Section :ref:`sec:tuning`. During an optimization run, the following (new) information is available to the algorithm: #. The result, i.e., the :math:`f`-value(s) from evaluating the problem at a given search point via |coco_evaluate_function|_. #. The result from evaluating the constraints of the problem at a given search point via |coco_evaluate_constraint|_. #. The result of |coco_problem_final_target_hit|_, which can be used to terminate a run conclusively without changing the performance assessment in any way. Currently, if the number of objectives :math:`m > 1`, this function returns always zero. The number of evaluations of the problem and/or constraints are the search costs, also referred to as *runtime*, and used for the performance assessment of the algorithm. [#]_ .. .. [#] Note, however, that the Pareto set in the bi-objective case is not always guaranteed to lie in its entirety within the region of interest. .. [#] |coco_problem_get_evaluations(const coco_problem_t * problem)|_ is a convenience function that returns the number of evaluations done on ``problem``. Because this information is available to the optimization algorithm anyway, the convenience function might be used additionally. .. _sec:budget: Budget, Termination Criteria, and Restarts ------------------------------------------ Algorithms and/or setups with any budget of function evaluations are eligible, the benchmarking setup is *budget-free*. We consider termination criteria to be part of the benchmarked algorithm. The choice of termination is a relevant part of the algorithm. On the one hand, allowing a larger number of function evaluations increases the chance to find solutions with better quality. On the other hand, a timely termination of stagnating runs can improve the performance, as these evaluations can be used more effectively. To exploit a large(r) number of function evaluations effectively, we encourage to use **independent restarts** [#]_, in particular for algorithms which terminate naturally within a comparatively small budget. Independent restarts are a natural way to approach difficult optimization problems and do not change the central performance measure used in COCO_ (hence it is budget-free), however, independent restarts improve the reliability, comparability [#]_, precision, and "visibility" of the measured results. Moreover, any **multistart procedure** (which relies on an interim termination of the algorithm) is encouraged. Multistarts may not be independent as they can feature a parameter sweep (e.g., increasing population size [HAR1999]_ [AUG2005]_), can be based on the outcome of the previous starts, and/or feature a systematic change of the initial conditions for the algorithm. After a multistart procedure has been established, a recommended procedure is to use a budget proportional to the dimension, :math:`k\times n`, and run repeated experiments with increase :math:`k`, e.g. like :math:`3, 10, 30, 100, 300,\dots`, which is a good compromise between availability of the latest results and computational overhead. An algorithm can be conclusively terminated if |coco_problem_final_target_hit|_ returns 1. [#]_ This saves CPU cycles without affecting the performance assessment, because there is no target left to hit. .. [#] The COCO_ platform provides example code implementing independent restarts. .. [#] Algorithms are only comparable up to the smallest budget given to any of them. .. [#] For the ``bbob-biobj`` suite this is however currently never the case. .. |j| replace:: :math:`j` .. For example, using a fast algorithm with a small success probability, say 5% (or 1%), chances are that not a single of 15 trials is successful. With 10 (or 90) independent restarts, the success probability will increase to 40% and the performance will become visible. At least four to five (here out of 15) successful trials are desirable to accomplish a stable performance measurement. This reasoning remains valid for any target function value (different values are considered in the evaluation). .. Restarts either from a previous solution, or with a different parameter setup, for example with different (increasing) population sizes, might be considered, as it has been applied quite successful [Auger:2005a]_ [Harik:1999]_. .. Choosing different setups mimics what might be done in practice. All restart mechanisms are finally considered as part of the algorithm under consideration. .. The easiest functions of BBOB can be solved in less than :math:`10 D` function evaluations, while on the most difficult functions a budget of more than :math:`1000 D^2` function evaluations to reach the final :math:`f_\mathrm{target} = f_\mathrm{opt} + 10^{-8}` is expected. .. _sec:tuning: Parameter Setting and Tuning of Algorithms ========================================== .. The algorithm and the used parameter setting for the algorithm should be described thoroughly. Any tuning of algorithm parameters to the test suite should be described and *the approximate overall number of tested parameter settings or algorithm variants and the approximate overall invested budget should be given*. The only recommended tuning procedure is the verification that **termination conditions** of the algorithm are suited to the given testbed and, in case, tuning of termination parameters. [#]_ Too early or too late termination can be identified and adjusted comparatively easy. This is also a useful prerequisite for allowing restarts to become more effective. On all functions the very same parameter setting must be used (which might well depend on the dimensionality, see Section :ref:`sec:input`). That means, the *a priori* use of function-dependent parameter settings is prohibited (since 2012). The function ID or any function characteristics (like separability, multi-modality, ...) cannot be considered as input parameter to the algorithm. On the other hand, benchmarking different parameter settings as "different algorithms" on the entire test suite is encouraged. .. [#] For example in the single objective case, care should be taken to apply termination conditions that allow to hit the final target on the most basic functions, like the sphere function :math:`f_1`, that is on the problems 0, 360, 720, 1080, 1440, and 1800 of the ``bbob`` suite. In our experience, numerical optimization software frequently terminates too early by default, while evolutionary computation software often terminates too late by default. .. In order to combine different parameter settings within a single algorithm, one can use multiple runs with different parameters (for example restarts, see also Section :ref:`sec:budget`), or probing techniques to identify problem-wise the appropriate parameters online. The underlying assumption in this experimental setup is that also in practice we do not know in advance whether the algorithm will face :math:`f_1` or :math:`f_2`, a unimodal or a multimodal function... therefore we cannot adjust algorithm parameters *a priori* [#]_. .. In contrast to most other function properties, the property of having noise can usually be verified easily. Therefore, for noisy functions a *second* testbed has been defined. The two testbeds can be approached *a priori* with different parameter settings or different algorithms. .. .. # [Auger:2005a] A Auger and N Hansen. A restart CMA evolution strategy with increasing population size. In *Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2005)*, pages 1769–1776. IEEE Press, 2005. .. .. _sec:recommendations: Recommendations =============== The performance assessment is based on a set of evaluation counts associated with the :math:`f`-value or -vector of a solution. By default, each evaluation count is associated with the respectively *evaluated* solution and hence its :math:`f`-value. In the single-objective case, the solution associated *to the current (last) evaluation* can be changed by calling |coco_recommend_solution|_, thereby associating the :math:`f`-value of the *recommended* solution (instead of the *evaluated* solution) with the current evaluation count. A recommendation is best viewed as the *currently best known approximation* of the optimum [#]_ delivered by the optimization algorithm, or as the currently most desirable return value of the algorithm. Recommendations allow the algorithm to explore solutions without affecting the performance assessment. For example, a surrogate-based algorithm can explore (i.e. evaluate) an arbitrarily bad solution, update the surrogate model and then recommend the (new) model optimizer. On non-noisy suites it is neither necessary nor advantageous to recommend the same solution repeatedly. .. On non-noisy suites the last evaluation changes the assessment only if the :math:`f`-value is better than all :math:`f`-values from previous evaluations. .. [#] In the noisy scenario, a small number of the most current solutions will be taken into account [HAN2016perf]_. In the multi-objective scenario, the recommendation option is not available, because an archive of non-dominated solutions presumes that all solutions are evaluated. Time Complexity Experiment ========================== In order to get a rough measurement of the time complexity of the algorithm, the wall-clock or CPU time should be measured when running the algorithm on the benchmark suite. The chosen setup should reflect a "realistic average scenario". [#]_ The *time divided by the number of function evaluations shall be presented separately for each dimension*. The chosen setup, coding language, compiler and computational architecture for conducting these experiments should be given. .. The :file:`exampletiming.*` code template is provided to run this experiment. For CPU-inexpensive algorithms the timing might mainly reflect the time spent in function :math:`fgeneric`. .. [#] The example experiment code provides the timing output measured over all problems of a single dimension by default. It also can be used to make a record of the same timing experiment with "pure random search", which can serve as additional base-line data. On the ``bbob`` test suite, also only the first instance of the Rosenbrock function :math:`f_8` had been used for this experiment previously, that is, the suite indices 105, 465, 825, 1185, 1545, 1905. .. raw:: html

Acknowledgments

.. raw:: latex \section*{Acknowledgments} The authors would like to thank Raymond Ros, Steffen Finck, Marc Schoenauer, Petr Posik and Dejan Tušar for their many invaluable contributions to this work. This work was support by the grant ANR-12-MONU-0009 (NumBBO) of the French National Research Agency. .. ############################# References ################################### .. raw:: html

References

.. [AUG2005] A. Auger and N. Hansen. A restart CMA evolution strategy with increasing population size. In *Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2005)*, pages 1769--1776. IEEE Press, 2005. .. .. [Auger:2005b] A. Auger and N. Hansen. Performance evaluation of an advanced local search evolutionary algorithm. In *Proceedings of the IEEE Congress on Evolutionary Computation (CEC 2005)*, pages 1777-1784, 2005. .. .. [Auger:2009] A. Auger and R. Ros. Benchmarking the pure random search on the BBOB-2009 testbed. In F. Rothlauf, editor, *GECCO (Companion)*, pages 2479-2484. ACM, 2009. .. .. [Efron:1993] B. Efron and R. Tibshirani. *An introduction to the bootstrap.* Chapman & Hall/CRC, 1993. .. [HAN2016perf] N. Hansen, A. Auger, D. Brockhoff, D. Tušar, T. Tušar. `COCO: Performance Assessment`__. *ArXiv e-prints*, `arXiv:1605.03560`__, 2016. __ http://numbbo.github.io/coco-doc/perf-assessment __ http://arxiv.org/abs/1605.03560 .. [HAN2009] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-Parameter Black-Box Optimization Benchmarking 2009: Experimental Setup, *Inria Research Report* RR-6828 http://hal.inria.fr/inria-00362649/en, 2009. .. [HAN2010] N. Hansen, A. Auger, S. Finck, and R. Ros. Real-Parameter Black-Box Optimization Benchmarking 2010: Experimental Setup, *Inria Research Report* RR-7215 http://hal.inria.fr/inria-00362649/en, 2010. .. [HAN2016co] N. Hansen, A. Auger, O. Mersmann, T. Tušar, D. Brockhoff. `COCO: A Platform for Comparing Continuous Optimizers in a Black-Box Setting`__, *ArXiv e-prints*, `arXiv:1603.08785`__, 2016. .. __: http://numbbo.github.io/coco-doc/ .. __: http://arxiv.org/abs/1603.08785 .. [HAR1999] G.R. Harik and F.G. Lobo. A parameter-less genetic algorithm. In *Proceedings of the Genetic and Evolutionary Computation Conference (GECCO)*, volume 1, pages 258-265. ACM, 1999. .. [HOO1998] H.H. Hoos and T. Stützle. Evaluating Las Vegas algorithms: pitfalls and remedies. In *Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence (UAI-98)*, pages 238-245, 1998. .. .. [PRI1997] K. Price. Differential evolution vs. the functions of the second ICEO. In Proceedings of the IEEE International Congress on Evolutionary Computation, pages 153--157, 1997. .. ############################## END Document #######################################