Online and offline archiving of COCO data.

create and get are the main functions to create and retrieve online and local offline archives. Local archives can be listed via ArchivesLocal (experimental/beta), already used online archives are listed in ArchivesKnown.

An online archive class defines, and is defined by, a source URL containing an archive definition file and the archived data.

get('all') returns all "officially" archived data as given in a folder hierarchy (this may be abondoned in future). Derived classes "point" to subfolders in the folder tree and "contain" all archived data from a single test suites. For example, get('bbob') returns the archived data list for the bbob testbed.

How to Create an Online Archive

First, we prepare the datasets. A dataset is a (tar-)zipped file containing a full experiment from a single algorithm. The first ten-or-so characters of the filename should be readible and informative. Datasets can reside in an arbitrary subfolder structure, but the folders should contain no further (ambiguous) files in order to create an archive from the archive root folder.

Second, we create the new archive with create,

>>> import cocopp
>>> from cocopp import archiving
>>> local_path = './my-archive-root-folder'
>>> archiving.create(local_path)  # doctest:+SKIP

thereby creating an archive definition file in the given folder. The created archive can be re-instantiated with cocopp.archiving.get and all data can be processed with cocopp.main, like

>>> my_archive = archiving.get(local_path)  # doctest:+SKIP
>>> cocopp.main(my_archive.get_all(''))  # doctest:+SKIP

We may want to check beforehand the archive size like

>>> len(my_archive)  # doctest:+SKIP

as an archive may contain hundreds of data sets. In case, we can choose a subset to process (see help of main and/or of the archive instance).

Third, we put a mirror of the archive online, like:

rsync -zauv my-archives/unique-name/ http://my-coco-online-archives/a-name

Now, everyone can use the archive on the fly like

>>> remote_def = 'http://my-coco-online-archives/a-name'
>>> remote_archive = cocopp.archiving.get(remote_def)  # doctest:+SKIP

just as a local archive. Archive data are downloaded only on demand. All data can be made available offline (which might take long) with:

>>> remote_archive.get_all('')  # doctest:+SKIP

Remote archives that have been used once can be listed via ArchivesKnown (experimental/beta).

Details: a definition file contains a list of all contained datasets by path/filename, a sha256 hash and optionally their approximate size. Datasets are (tar-)zipped files containing a full experiment from a single algorithm.

Function get return a data archive COCODataArchive.
Function read_definition_file return definition triple list
Function create create a definition file for an existing local "archive" of data.
Class COCODataArchive Data archive based on an archive definition file.
Class COCOBBOBDataArchive list of archived data for the 'bbob' test suite.
Class COCOBBOBNoisyDataArchive This class "contains" archived data for the 'bbob-noisy' suite.
Class COCOBBOBBiobjDataArchive This class "contains" archived data for the 'bbob-biobj' suite.
Class ListOfArchives List of URLs or path names to COCO data archives available to this user.
Class OfficialArchives overdesigned class to connect URLs, names, and classes of "official" archives.
Class ArchivesLocal COCO data archives somewhere local on this machine.
Class ArchivesKnown Known (and already used) remote COCO data archives.
Class RemoteListOfArchives Elements of this list can be used directly in cocopp.archiving.get.
Function _abs_path return a (OS-dependent) user-expanded path.
Function _makedirs Undocumented
Function _make_backup backup file with added time stamp if it exists, otherwise do nothing.
Function _url_to_folder_name return a path within the default archive location
Function _is_url Undocumented
Function _definition_file_to_read return absolute path for sound definition file name.
Function _definition_file_to_write return absolute path to a possibly non-exisiting definition file name.
Function _hash compute hash of file file_name
Function _str_to_list try to return a non-string iterable in either case
Function _move_official_local_data move "official" archives folder to the generic standardized location once and for all
Function _repr_definitions Undocumented
Function _url_add add ('_url_', url), to the definition file in folder.
Function _download_definitions download definition file and sync url into it
Function _get_remote return remote data archive as COCODataArchive instance.
Class _ArchivesOfficial superseded by OfficialArchives
def _abs_path(path, *args):

return a (OS-dependent) user-expanded path.

os.path.abspath takes care of using the right os.path.sep.

def _makedirs(path, error_ok=True):
Undocumented
def _make_backup(fullname):
backup file with added time stamp if it exists, otherwise do nothing.
def _url_to_folder_name(url):
return a path within the default archive location
def _is_url(s):
Undocumented
def _definition_file_to_read(local_path_or_definition_file):

return absolute path for sound definition file name.

The file or path may or may not exist.

def _definition_file_to_write(local_path_or_filename, filename=None):

return absolute path to a possibly non-exisiting definition file name.

Creates a backup if the file exists. Does not create the file or folders when they do not exist.

Details: if filename is None, tries to guess whether the first argument already includes the filename. If it seems necessary, default_definition_filename is appended.

def _hash(file_name, hash_function=hashlib.sha256):
compute hash of file file_name
def _str_to_list(str_or_list):
try to return a non-string iterable in either case
def _move_official_local_data():
move "official" archives folder to the generic standardized location once and for all
def _repr_definitions(list_):
Undocumented
def _url_add(folder, url):

add ('_url_', url), to the definition file in folder.

This function is idempotent, however different urls may be in the list.

def _download_definitions(url, target_folder):
download definition file and sync url into it
def _get_remote(url, target_folder=None, redownload=False):

return remote data archive as COCODataArchive instance.

If necessary, the archive is "created" by downloading the definition file from url to target_folder which doesn't need to exist.

Details: The target folder name is by default derived from the url and created within default_archive_location == ~/.cocopp/data-archives.

def get(url_or_folder=None):

return a data archive COCODataArchive.

url_or_folder must be an URL or a folder, any of which must contain an archive definition file of name coco_archive_definition.txt. Use create to create this file if necessary.

When an URL is given the archive may already exist locally from previous calls of get. Then, get(url).update() updates the definition file and returns the updated archive. Only the definition file is updated, no data are downloaded before they are requested. The updated class instance re-downloads requested data when the saved hash disagrees with the computed hash. With new instances of the archive, if COCODataArchive.update is not called on them, an error message may be shown when they try to use outdated local data and the data can be deleted manually as specified in the shown message.

Remotely retrieved archive definitions are registered with ArchivesKnown and cocopp.archiving.ArchivesKnown() will show a list.

>>> import cocopp
>>> url = 'http://lq-cma.gforge.inria.fr/data-archives/lq-gecco2019'
>>> arch = cocopp.archiving.get(url).update()  # downloads a 0.4KB definition file
>>> len(arch)
4
>>> assert arch.remote_data_path == url

See cocopp.archives for "officially" available archives.

def read_definition_file(local_path_or_definition_file):
return definition triple list
def create(local_path):

create a definition file for an existing local "archive" of data.

The archive in local_path must have been prepared such that it contains only (tar-g-)zipped data set files, one file for each data set / algorithm, within an otherwise arbitrary folder structure (it is possible and for large archives often desirable to create and maintain sub-archives within folders of an archive). Choose the name of the zip files carefully as they become the displayed algorithm names.

If a definition file already exists it is backed up and replaced.

The "created" archive is registered with ArchivesLocal serving as a user-owned machine-wide memory. cocopp.archiving.ArchivesLocal() shows the list.

>>> from cocopp import archiving
>>> # folder containing the data we want to become known in the archive:
>>> local_path = 'my-archives/my-first-archive'
>>>
>>> my_archive = archiving.create(local_path)  # doctest:+SKIP
>>> same_archive = archiving.get(local_path)  # doctest:+SKIP

An archive definition file is a list of (relative file name, hash and (optionally) filesize) triplets.

Assumes that local_path points to a complete and sane archive or a definition file to be generated at the root of this archive.

In itself this is not particularly useful, because we can also directly load or use the zip files instead of archiving them first and accessing the data then from the archive class within Python.

However, if the data are put online together with the definition file, everyone can locally re-create this archive via get and use the returned COCODataArchive without downloading any data immediately, but only "on demand".

API Documentation for cocopp, generated by pydoctor at 2020-01-21 16:27:37.