class documentation

Data archive based on an archive definition file.

This class is not meant to be instantiated directly. Instead, use cocopp.archiving.get to get a class instance. The class needs an archive definition file to begin with, as created with cocopp.archiving.create.

See cocopp.archives or cocopp.archiving.official_archives for the "official" archives.

This class "is" a list (StrList) of names which are relative file names separated with slashes "/". Each name represents the zipped data from a full archived experiment, benchmarking one algorithm on an entire benchmark suite.

The function create serves to create a new user-defined archive from experiment data which can be loaded with get. Other derived classes define other specific (sub)archives.

Using the class

Calling the class instance (alias to find) helps to extract entries matching one or several substrings, e.g. a year or a method. find_indices returns the respective indices instead of the names. print displays both. For example:

>>> import cocopp
>>> cocopp.archives.bbob.find('bfgs')  # doctest:+SKIP
['2009/BFGS_ros_noiseless.tgz',
 '2012/DE-BFGS_voglis_noiseless.tgz',
 '2012/PSO-BFGS_voglis_noiseless.tgz',
 '2014-others/BFGS-scipy-Baudis.tgz',
 '2014-others/L-BFGS-B-scipy-Baudis.tgz'...

To post-process these data call:

>>> cocopp.main(cocopp.archives.bbob.get_all('bfgs'))  # doctest:+SKIP

Method get downloads a single "matching" data set if necessary and returns the absolute data path which can be used with cocopp.main.

Method index is inherited from list and finds the index of the respective name entry in the archive (exact match only).

cocopp.archives.all contains all experimental data for all test suites.

>>> import cocopp
>>> bbob = cocopp.archives.bbob  # the bbob testbed archive
>>> len(bbob) > 150
True
>>> bbob[:3]  # doctest:+ELLIPSIS,+SKIP,
['2009/...
>>> bbob('2009/bi')[0]  # doctest:+ELLIPSIS,+SKIP,
'...

Get a list of already downloaded data full pathnames or None:

>>> [bbob.get(i, remote=False) for i in range(len(bbob))] # doctest:+ELLIPSIS
[...

Find something more specific:

>>> bbob('auger')[0]  # == bbob.find('auger')[0]  # doctest:+SKIP,
'2009/CMA-ESPLUSSEL_auger_noiseless.tgz'

corresponds to cocopp.main('auger!').

>>> bbob.index('2009/CMA-ESPLUSSEL_auger_noiseless.tgz')  # just list.index
5
>>> data_path = bbob.get(bbob(['au', '2009'])[0], remote=False)
>>> assert data_path is None or str(data_path) == data_path

These commands may download data, to avoid this the option remote=False is given:

>>> ' '.join(bbob.get(i, remote=False) or '' for i in [2, 13, 33])  # can serve as argument to cocopp.main  # doctest:+ELLIPSIS,+SKIP,
'...
>>> bbob.get_all([2, 13, 33], remote=False).as_string  # is the same  # doctest:+ELLIPSIS,+SKIP,
' ...
>>> ' '.join(bbob.get(name, remote=False) for name in [bbob[2], bbob[13], bbob[33]])  # is the same  # doctest:+ELLIPSIS,+SKIP,
'...
>>> ' '.join(bbob.get(name, remote=False) for name in [
...         '2009/BAYEDA_gallagher_noiseless.tgz',
...         '2009/GA_nicolau_noiseless.tgz',
...         '2010/1komma2mirser_brockhoff_noiseless.tar.gz'])  # is the same  # doctest:+ELLIPSIS,+SKIP,
'...

DONE: join with COCODataArchive, to get there: - DONE upload definition files to official archives - DONE? use uploaded definition files (see official_archive_locations in _get_remote) - DONE? replace usages of derived data classes by get - DONE remove definition list in code of the root class - DONE review and join classes without default for local path

Static Method is_archive return True if folder contains a COCO archive definition file
Method __init__ Argument is a local path to the archive.
Method check_hash raise Exception when hashes disagree or file is missing.
Method consistency_check_data basic quick consistency check of downloaded data.
Method consistency_check_read check/compare against definition file on disk
Method contains return True if (the exact) name or path is in the archive
Method full_path return full local path of name or any path, idempotent
Method get return the full data pathname of substr in the archived data.
Method get_all Return a list (StrList) of absolute pathnames,
Method get_extended return a list of valid paths.
Method get_first get the first archived data matching all of substrs.
Method get_found get full entries of the last find
Method get_one deprecated, for backwards compatibility only, use get instead
Method read_definition_file return definition triple list
Method update update definition file, either from remote location or from local data.
Instance Variable local_data_path Undocumented
Instance Variable remote_data_path Undocumented
Property downloaded return list of data set names of locally available data.
Method _download create full local path and download single dataset
Method _hash compute hash of name or path
Method _known_hash return known hash or None
Method _name return supposed name of full_path or name without any checks
Method _name_with_check return name of full_path, idempotent.
Method _url_ return value of _url_ entry in definition_list file or None.
Instance Variable _all Undocumented
Instance Variable _all_dict Undocumented
Instance Variable _checked_consistency Undocumented
Instance Variable _names_found Undocumented
Instance Variable _print Undocumented
Instance Variable _redownload_if_changed Undocumented

Inherited from StrList:

Method __call__ alias to find
Method find return entries that match all substrs.
Method find_indices same as find but returns indices instead of names
Method print print the result of find(*substrs) with indices.
Property as_string return space separated string concatenation surrounded by spaces.
Property found StrList of elements found during the last call to find.
@staticmethod
def is_archive(url_or_folder):

return True if folder contains a COCO archive definition file

def __init__(self, local_path):

Argument is a local path to the archive.

This class is not anymore meant to be used directly, rather use cocopp.archiving.get.

local_path is an archive folder containing a definition file, possibly downloaded with get calling _get_remote from a given url. ~ may refer to the user home folder.

Set _all and self from _all without _url_` entry. This init does not deal with remote logic, it only reads in _url_ from the definition file into the remote_data_path attribute.

Details: Set _all_dict which is a (never used) dictionary generated from _all and self and consists of the keys except for '_url_'.

def check_hash(self, name):

raise Exception when hashes disagree or file is missing.

raise RunTimeError if hash is unknown raise ValueError if hashes disagree

def consistency_check_data(self):

basic quick consistency check of downloaded data.

return (number_of_checked_data, number_of_all_data)

def consistency_check_read(self):

check/compare against definition file on disk

def contains(self, name):

return True if (the exact) name or path is in the archive

def full_path(self, name):

return full local path of name or any path, idempotent

def get(self, substr=None, remote=True):

return the full data pathname of substr in the archived data.

Retrieves the data from remote if necessary.

substr can be a substring that matches one and only one name in the data archive or an integer between 0 and len(self), see find or cocopp.archiving.OfficialArchives for how matching is determined.

Raises a ValueError if substr matches several archive entries on none.

If substr is None (default), the first match of the last call to find* or get* is used like self.found[0]`.

If remote is True (default), the respective data are downloaded from the remote location if necessary. Otherwise return None for a match.

def get_all(self, indices=None, remote=True):

Return a list (StrList) of absolute pathnames,

by repeatedly calling get. Elements of the indices list can be an index or a (sub)string that matches one or several names in the archive. If indices is None, the results from the last call to find are used. Data are downloaded if necessary.

See find or cocopp.archiving.OfficialArchives for how matching is determined.

See also get, get_extended.

def get_extended(self, args, remote=True):

return a list of valid paths.

Elements in args may be a valid path name or a known name from the data archive, or a uniquely matching substring of such a name, or a matching substring with added "!" in which case the first match is taken only (calling self.get_first), or a matching substring with added "*" in which case all matches are taken (calling self.get_all), or a regular expression containing a * and not ending with ! or *, in which case, for example, "bbob/2017.*cma" matches "bbob/2017/DTS-CMA-ES-Pitra.tgz" among others (in a regular expression "." matches any single character and ".*" matches any number >= 0 of characters).

def get_first(self, substrs, remote=True):

get the first archived data matching all of substrs.

substrs is a list of substrings.

get_first(substrs, remote) is a shortcut for:

self.find(*substrs)
if self.found:
    return self.get(self.found[0], remote=remote)
return None
def get_found(self, remote=True):

get full entries of the last find

def get_one(self, *args, **kwargs):

deprecated, for backwards compatibility only, use get instead

def read_definition_file(self):

return definition triple list

def update(self):

update definition file, either from remote location or from local data.

As remote archives may grow or change, a common usecase may be

>>> import cocopp.archiving as ac
>>> url = 'https://cma-es.github.io/lq-cma/data-archives/lq-gecco2019'
>>> arch = ac.get(url).update()  # doctest:+SKIP

For updating a local archive use:

create(self.local_data_path)

Details: for updating the local definition file from the local data rather use create. This will however remove a remote URL from its definition and the remote and the local archive can be different now. create makes a backup of the existing definition file.

local_data_path =

Undocumented

remote_data_path =

Undocumented

@property
downloaded =

return list of data set names of locally available data.

This is only meaningful for a remote archive.

def _download(self, name):

create full local path and download single dataset

def _hash(self, name, hash_function=hashlib.sha256):

compute hash of name or path

def _known_hash(self, name):

return known hash or None

def _name(self, full_path):

return supposed name of full_path or name without any checks

def _name_with_check(self, full_path):

return name of full_path, idempotent.

If full_path is not from the data archive a warning is issued and path seperators are replaced with /.

Check that all names are only once in the data archive:

>>> import cocopp
>>> bbob = cocopp.archives.bbob
>>> for name in bbob:
...     assert bbob.count(name) == 1, "%s counted %d times in data archive" % (name, bbob.count(name))
...     assert len(bbob.find(name)) == 1, "%s found %d times" % (name, bbob.find(name))
def _url_(self, definition_list=None):

return value of _url_ entry in definition_list file or None.

_all =

Undocumented

_all_dict =

Undocumented

_checked_consistency: bool =

Undocumented

_names_found: list =
_print =

Undocumented

_redownload_if_changed =

Undocumented