module documentation

Online and offline archiving of COCO data.

create and get are the main functions to create and retrieve online and local offline archives. Local archives can be listed via ArchivesLocal (experimental/beta), already used online archives are listed in ArchivesKnown.

An online archive class defines, and is defined by, a source URL containing an archive definition file and the archived data.

get('all') returns all "officially" archived data as given in a folder hierarchy (this may be abondoned in future). Derived classes "point" to subfolders in the folder tree and "contain" all archived data from a single test suites. For example, get('bbob') returns the archived data list for the bbob testbed.

How to Create an Online Archive

First, we prepare the datasets. A dataset is a (tar-)zipped file containing a full experiment from a single algorithm. The first ten-or-so characters of the filename should be readible and informative. Datasets can reside in an arbitrary subfolder structure, but the folders should contain no further (ambiguous) files in order to create an archive from the archive root folder.

Second, we create the new archive with create,

>>> import cocopp
>>> from cocopp import archiving
>>> local_path = './my-archive-root-folder'
>>> archiving.create(local_path)  # doctest:+SKIP

thereby creating an archive definition file in the given folder. The created archive can be re-instantiated with cocopp.archiving.get and all data can be processed with cocopp.main, like

>>> my_archive = archiving.get(local_path)  # doctest:+SKIP
>>> cocopp.main(my_archive.get_all(''))  # doctest:+SKIP

We may want to check beforehand the archive size like

>>> len(my_archive)  # doctest:+SKIP

as an archive may contain hundreds of data sets. In case, we can choose a subset to process (see help of main and/or of the archive instance).

Third, we put a mirror of the archive online, like:

rsync -zauv my-archives/unique-name/ http://my-coco-online-archives/a-name

Now, everyone can use the archive on the fly like

>>> remote_def = 'http://my-coco-online-archives/a-name'
>>> remote_archive = cocopp.archiving.get(remote_def)  # doctest:+SKIP

just as a local archive. Archive data are downloaded only on demand. All data can be made available offline (which might take long) with:

>>> remote_archive.get_all('')  # doctest:+SKIP

Remote archives that have been used once can be listed via ArchivesKnown (experimental/beta).

Details: a definition file contains a list of all contained datasets by path/filename, a sha256 hash and optionally their approximate size. Datasets are (tar-)zipped files containing a full experiment from a single algorithm.

Class ArchivesKnown Known (and already used) remote COCO data archives.
Class ArchivesLocal COCO data archives somewhere local on this machine.
Class COCOBBOBBiobjDataArchive This class "contains" archived data for the 'bbob-biobj' suite.
Class COCOBBOBDataArchive list of archived data for the 'bbob' test suite.
Class COCOBBOBNoisyDataArchive This class "contains" archived data for the 'bbob-noisy' suite.
Class COCODataArchive Data archive based on an archive definition file.
Class ListOfArchives List of URLs or path names to COCO data archives available to this user.
Class OfficialArchives overdesigned class to connect URLs, names, and classes of "official" archives.
Class RemoteListOfArchives Elements of this list can be used directly with cocopp.archiving.get.
Function create create a definition file for an existing local "archive" of data.
Function get return a data archive COCODataArchive.
Function read_definition_file return definition triple list
Variable __author__ Undocumented
Variable backup_last_filename Undocumented
Variable coco_url Undocumented
Variable coco_urls Undocumented
Variable cocopp_home Undocumented
Variable default_archive_location Undocumented
Variable default_definition_filename Undocumented
Variable listing_file_extension Undocumented
Variable listing_file_start Undocumented
Variable official_archives Undocumented
Class _old_ArchivesOfficial superseded by OfficialArchives
Function _abs_path return a (OS-dependent) user-expanded path.
Function _definition_file_to_read return absolute path for sound definition file name.
Function _definition_file_to_write return absolute path to a possibly non-exisiting definition file name.
Function _download_definitions download definition file and sync url into it
Function _get_remote return remote data archive as COCODataArchive instance.
Function _hash compute hash of file file_name
Function _is_url Undocumented
Function _make_backup backup file with added time stamp if it exists, otherwise do nothing.
Function _makedirs Undocumented
Function _old_move_official_local_data move "official" archives folder to the generic standardized location once and for all
Function _repr_definitions Undocumented
Function _str_to_list try to return a non-string iterable in either case
Function _url_add add ('_url_', url), to the definition file in folder.
Function _url_to_folder_name return a path within the default archive location
def create(local_path):

create a definition file for an existing local "archive" of data.

The archive in local_path must have been prepared such that it contains only (tar-g-)zipped data set files, one file for each data set / algorithm, within an otherwise arbitrary folder structure (it is possible and for large archives often desirable to create and maintain sub-archives within folders of an archive). Choose the name of the zip files carefully as they become the displayed algorithm names.

If a definition file already exists it is backed up and replaced.

The "created" archive is registered with ArchivesLocal serving as a user-owned machine-wide memory. cocopp.archiving.ArchivesLocal() shows the list.

>>> from cocopp import archiving
>>> # folder containing the data we want to become known in the archive:
>>> local_path = 'my-archives/my-first-archive'
>>>
>>> my_archive = archiving.create(local_path)  # doctest:+SKIP
>>> same_archive = archiving.get(local_path)  # doctest:+SKIP

An archive definition file is a list of (relative file name, hash and (optionally) filesize) triplets.

Assumes that local_path points to a complete and sane archive or a definition file to be generated at the root of this archive.

In itself this is not particularly useful, because we can also directly load or use the zip files instead of archiving them first and accessing the data then from the archive class within Python.

However, if the data are put online together with the definition file, everyone can locally re-create this archive via get and use the returned COCODataArchive without downloading any data immediately, but only "on demand".

def get(url_or_folder=None):

return a data archive COCODataArchive.

url_or_folder must be an URL or a folder, any of which must contain an archive definition file of name coco_archive_definition.txt. Use create to create this file if necessary.

When an URL is given the archive may already exist locally from previous calls of get. Then, get(url).update() updates the definition file and returns the updated archive. Only the definition file is updated, no data are downloaded before they are requested. The updated class instance re-downloads requested data when the saved hash disagrees with the computed hash. With new instances of the archive, if COCODataArchive.update is not called on them, an error message may be shown when they try to use outdated local data and the data can be deleted manually as specified in the shown message.

Remotely retrieved archive definitions are registered with ArchivesKnown and cocopp.archiving.ArchivesKnown() will show a list.

>>> import cocopp
>>> url = 'https://cma-es.github.io/lq-cma/data-archives/lq-gecco2019'
>>> arch = cocopp.archiving.get(url).update()  # downloads a 0.4KB definition file
>>> len(arch)
4
>>> assert arch.remote_data_path.split('//', 1)[1] == url.split('//', 1)[1], (arch.remote_data_path, url)

See cocopp.archives for "officially" available archives.

See also: get_all, get_extended.

def read_definition_file(local_path_or_definition_file):

return definition triple list

__author__: str =

Undocumented

backup_last_filename: str =

Undocumented

coco_url =

Undocumented

coco_urls: list[str] =

Undocumented

cocopp_home =

Undocumented

default_archive_location =

Undocumented

default_definition_filename: str =

Undocumented

listing_file_extension: str =

Undocumented

listing_file_start: str =

Undocumented

official_archives =

Undocumented

def _abs_path(path, *args):

return a (OS-dependent) user-expanded path.

os.path.abspath takes care of using the right os.path.sep.

def _definition_file_to_read(local_path_or_definition_file):

return absolute path for sound definition file name.

The file or path may or may not exist.

def _definition_file_to_write(local_path_or_filename, filename=None):

return absolute path to a possibly non-exisiting definition file name.

Creates a backup if the file exists. Does not create the file or folders when they do not exist.

Details: if filename is None, tries to guess whether the first argument already includes the filename. If it seems necessary, default_definition_filename is appended.

def _download_definitions(url, target_folder):

download definition file and sync url into it

def _get_remote(url, target_folder=None, redownload=False):

return remote data archive as COCODataArchive instance.

If necessary, the archive is "created" by downloading the definition file from url to target_folder which doesn't need to exist.

Details: The target folder name is by default derived from the url and created within default_archive_location == ~/.cocopp/data-archives.

def _hash(file_name, hash_function=hashlib.sha256):

compute hash of file file_name

def _is_url(s):

Undocumented

def _make_backup(fullname):

backup file with added time stamp if it exists, otherwise do nothing.

def _makedirs(path, error_ok=True):

Undocumented

def _old_move_official_local_data():

move "official" archives folder to the generic standardized location once and for all

def _repr_definitions(list_):

Undocumented

def _str_to_list(str_or_list):

try to return a non-string iterable in either case

def _url_add(folder, url):

add ('_url_', url), to the definition file in folder.

This function is idempotent, however different urls may be in the list.

def _url_to_folder_name(url):

return a path within the default archive location