Online and offline archiving of COCO data.
create
and get
are the main functions to create and retrieve online and
local offline archives. Local archives can be listed via ArchivesLocal
(experimental/beta), already used online archives are listed in ArchivesKnown
.
An online archive class defines, and is defined by, a source URL containing an archive definition file and the archived data.
get('all') returns all "officially" archived data as given in a folder
hierarchy (this may be abondoned in future). Derived classes "point" to
subfolders in the folder tree and "contain" all archived data from a single
test suites. For example, get('bbob') returns the archived data list
for the bbob
testbed.
How to Create an Online Archive
First, we prepare the datasets. A dataset is a (tar-)zipped file containing a full experiment from a single algorithm. The first ten-or-so characters of the filename should be readible and informative. Datasets can reside in an arbitrary subfolder structure, but the folders should contain no further (ambiguous) files in order to create an archive from the archive root folder.
Second, we create the new archive with create
,
>>> import cocopp >>> from cocopp import archiving >>> local_path = './my-archive-root-folder' >>> archiving.create(local_path) # doctest:+SKIP
thereby creating an archive definition file in the given folder. The
created archive can be re-instantiated with cocopp.archiving.get
and
all data can be processed with cocopp.main
, like
>>> my_archive = archiving.get(local_path) # doctest:+SKIP >>> cocopp.main(my_archive.get_all('')) # doctest:+SKIP
We may want to check beforehand the archive size like
>>> len(my_archive) # doctest:+SKIP
as an archive may contain hundreds of data sets. In case, we can choose a
subset to process (see help of main
and/or of the archive instance).
Third, we put a mirror of the archive online, like:
rsync -zauv my-archives/unique-name/ http://my-coco-online-archives/a-name
Now, everyone can use the archive on the fly like
>>> remote_def = 'http://my-coco-online-archives/a-name' >>> remote_archive = cocopp.archiving.get(remote_def) # doctest:+SKIP
just as a local archive. Archive data are downloaded only on demand. All data can be made available offline (which might take long) with:
>>> remote_archive.get_all('') # doctest:+SKIP
Remote archives that have been used once can be listed via ArchivesKnown
(experimental/beta).
Details: a definition file contains a list of all contained datasets by path/filename, a sha256 hash and optionally their approximate size. Datasets are (tar-)zipped files containing a full experiment from a single algorithm.
Class |
|
Known (and already used) remote COCO data archives. |
Class |
|
COCO data archives somewhere local on this machine. |
Class |
|
This class "contains" archived data for the 'bbob-biobj' suite. |
Class |
|
list of archived data for the 'bbob' test suite. |
Class |
|
This class "contains" archived data for the 'bbob-noisy' suite. |
Class |
|
Data archive based on an archive definition file. |
Class |
|
List of URLs or path names to COCO data archives available to this user. |
Class |
|
overdesigned class to connect URLs, names, and classes of "official" archives. |
Class |
|
Elements of this list can be used directly with cocopp.archiving.get . |
Function | create |
create a definition file for an existing local "archive" of data. |
Function | get |
return a data archive COCODataArchive . |
Function | read |
return definition triple list |
Variable | __author__ |
Undocumented |
Variable | backup |
Undocumented |
Variable | coco |
Undocumented |
Variable | coco |
Undocumented |
Variable | cocopp |
Undocumented |
Variable | default |
Undocumented |
Variable | default |
Undocumented |
Variable | listing |
Undocumented |
Variable | listing |
Undocumented |
Variable | official |
Undocumented |
Class | _old_ |
superseded by OfficialArchives |
Function | _abs |
return a (OS-dependent) user-expanded path. |
Function | _definition |
return absolute path for sound definition file name. |
Function | _definition |
return absolute path to a possibly non-exisiting definition file name. |
Function | _download |
download definition file and sync url into it |
Function | _get |
return remote data archive as COCODataArchive instance. |
Function | _hash |
compute hash of file file_name |
Function | _is |
Undocumented |
Function | _make |
backup file with added time stamp if it exists, otherwise do nothing. |
Function | _makedirs |
Undocumented |
Function | _old |
move "official" archives folder to the generic standardized location once and for all |
Function | _repr |
Undocumented |
Function | _str |
try to return a non-string iterable in either case |
Function | _url |
add ('_url_', url), to the definition file in folder . |
Function | _url |
return a path within the default archive location |
create a definition file for an existing local "archive" of data.
The archive in local_path
must have been prepared such that it
contains only (tar-g-)zipped data set files, one file for each data
set / algorithm, within an otherwise arbitrary folder structure (it is
possible and for large archives often desirable to create and maintain
sub-archives within folders of an archive). Choose the name of the zip
files carefully as they become the displayed algorithm names.
If a definition file already exists it is backed up and replaced.
The "created" archive is registered with ArchivesLocal
serving as a
user-owned machine-wide memory. cocopp.archiving.ArchivesLocal()
shows the list.
>>> from cocopp import archiving >>> # folder containing the data we want to become known in the archive: >>> local_path = 'my-archives/my-first-archive' >>> >>> my_archive = archiving.create(local_path) # doctest:+SKIP >>> same_archive = archiving.get(local_path) # doctest:+SKIP
An archive definition file is a list of (relative file name, hash and (optionally) filesize) triplets.
Assumes that local_path
points to a complete and sane archive or
a definition file to be generated at the root of this archive.
In itself this is not particularly useful, because we can also directly load or use the zip files instead of archiving them first and accessing the data then from the archive class within Python.
However, if the data are put online together with the definition file,
everyone can locally re-create this archive via get
and use the
returned COCODataArchive
without downloading any data
immediately, but only "on demand".
return a data archive COCODataArchive
.
url_or_folder
must be an URL or a folder, any of which must contain
an archive definition file of name coco_archive_definition.txt
. Use
create
to create this file if necessary.
When an URL is given the archive may already exist locally from
previous calls of get
. Then, get(url).update() updates the
definition file and returns the updated archive. Only the definition
file is updated, no data are downloaded before they are requested. The
updated class instance re-downloads requested data when the saved hash
disagrees with the computed hash. With new instances of the archive, if
COCODataArchive.update
is not called on them, an error message
may be shown when they try to use outdated local data and the data can
be deleted manually as specified in the shown message.
Remotely retrieved archive definitions are registered with ArchivesKnown
and cocopp.archiving.ArchivesKnown() will show a list.
>>> import cocopp >>> url = 'https://cma-es.github.io/lq-cma/data-archives/lq-gecco2019' >>> arch = cocopp.archiving.get(url).update() # downloads a 0.4KB definition file >>> len(arch) 4 >>> assert arch.remote_data_path.split('//', 1)[1] == url.split('//', 1)[1], (arch.remote_data_path, url)
See cocopp.archives
for "officially" available archives.
See also: get_all
, get_extended
.
return a (OS-dependent) user-expanded path.
os.path.abspath
takes care of using the right os.path.sep
.
return absolute path to a possibly non-exisiting definition file name.
Creates a backup if the file exists. Does not create the file or folders when they do not exist.
Details: if filename is None, tries to guess whether the first
argument already includes the filename. If it seems necessary,
default_definition_filename
is appended.
return remote data archive as COCODataArchive
instance.
If necessary, the archive is "created" by downloading the definition file
from url
to target_folder
which doesn't need to exist.
Details: The target folder name is by default derived from the url
and
created within default_archive_location == ~/.cocopp/data-archives.