class COCODataArchive(_td.StrList):
Known subclasses: cocopp.archiving.COCOBBOBBiobjDataArchive
, cocopp.archiving.COCOBBOBDataArchive
, cocopp.archiving.COCOBBOBNoisyDataArchive
Constructor: COCODataArchive(local_path)
Data archive based on an archive definition file.
This class is not meant to be instantiated directly. Instead, use
cocopp.archiving.get
to get a class instance. The class needs an
archive definition file to begin with, as created with
cocopp.archiving.create
.
See cocopp.archives
or cocopp.archiving.official_archives
for the
"official" archives.
This class "is" a list
(StrList
) of names which are relative file
names separated with slashes "/". Each name represents the zipped data
from a full archived experiment, benchmarking one algorithm on an
entire benchmark suite.
The function create
serves to create a new user-defined archive from
experiment data which can be loaded with get
. Other derived classes define
other specific (sub)archives.
Using the class
Calling the class instance (alias to find
) helps to extract entries
matching one or several substrings, e.g. a year or a method.
find_indices
returns the respective indices instead of the names.
print
displays both. For example:
>>> import cocopp >>> cocopp.archives.bbob.find('bfgs') # doctest:+SKIP ['2009/BFGS_ros_noiseless.tgz', '2012/DE-BFGS_voglis_noiseless.tgz', '2012/PSO-BFGS_voglis_noiseless.tgz', '2014-others/BFGS-scipy-Baudis.tgz', '2014-others/L-BFGS-B-scipy-Baudis.tgz'...
To post-process these data call:
>>> cocopp.main(cocopp.archives.bbob.get_all('bfgs')) # doctest:+SKIP
Method get
downloads a single "matching" data set if necessary and
returns the absolute data path which can be used with
cocopp.main
.
Method index
is inherited from list
and finds the index of the
respective name entry in the archive (exact match only).
cocopp.archives.all
contains all experimental data for all test
suites.
>>> import cocopp >>> bbob = cocopp.archives.bbob # the bbob testbed archive >>> len(bbob) > 150 True
>>> bbob[:3] # doctest:+ELLIPSIS,+SKIP, ['2009/... >>> bbob('2009/bi')[0] # doctest:+ELLIPSIS,+SKIP, '...
Get a list
of already downloaded data full pathnames or None
:
>>> [bbob.get(i, remote=False) for i in range(len(bbob))] # doctest:+ELLIPSIS [...
Find something more specific:
>>> bbob('auger')[0] # == bbob.find('auger')[0] # doctest:+SKIP, '2009/CMA-ESPLUSSEL_auger_noiseless.tgz'
corresponds to cocopp.main('auger!').
>>> bbob.index('2009/CMA-ESPLUSSEL_auger_noiseless.tgz') # just list.index 5
>>> data_path = bbob.get(bbob(['au', '2009'])[0], remote=False) >>> assert data_path is None or str(data_path) == data_path
These commands may download data, to avoid this the option remote=False is given:
>>> ' '.join(bbob.get(i, remote=False) or '' for i in [2, 13, 33]) # can serve as argument to cocopp.main # doctest:+ELLIPSIS,+SKIP, '... >>> bbob.get_all([2, 13, 33], remote=False).as_string # is the same # doctest:+ELLIPSIS,+SKIP, ' ... >>> ' '.join(bbob.get(name, remote=False) for name in [bbob[2], bbob[13], bbob[33]]) # is the same # doctest:+ELLIPSIS,+SKIP, '... >>> ' '.join(bbob.get(name, remote=False) for name in [ ... '2009/BAYEDA_gallagher_noiseless.tgz', ... '2009/GA_nicolau_noiseless.tgz', ... '2010/1komma2mirser_brockhoff_noiseless.tar.gz']) # is the same # doctest:+ELLIPSIS,+SKIP, '...
DONE: join with COCODataArchive, to get there:
- DONE upload definition files to official archives
- DONE? use uploaded definition files (see official_archive_locations in _get_remote
)
- DONE? replace usages of derived data classes by get
- DONE remove definition list in code of the root class
- DONE review and join classes without default for local path
Static Method | is |
return True if folder contains a COCO archive definition file |
Method | __init__ |
Argument is a local path to the archive. |
Method | check |
raise Exception when hashes disagree or file is missing. |
Method | consistency |
basic quick consistency check of downloaded data. |
Method | consistency |
check/compare against definition file on disk |
Method | contains |
return True if (the exact) name or path is in the archive |
Method | full |
return full local path of name or any path, idempotent |
Method | get |
return the full data pathname of substr in the archived data. |
Method | get |
Return a list (StrList ) of absolute pathnames, |
Method | get |
return a list of valid paths. |
Method | get |
get the first archived data matching all of substrs . |
Method | get |
get full entries of the last find |
Method | get |
deprecated, for backwards compatibility only, use get instead |
Method | read |
return definition triple list |
Method | update |
update definition file, either from remote location or from local data. |
Instance Variable | local |
Undocumented |
Instance Variable | remote |
Undocumented |
Property | downloaded |
return list of data set names of locally available data. |
Method | _download |
create full local path and download single dataset |
Method | _hash |
compute hash of name or path |
Method | _known |
return known hash or None |
Method | _name |
return supposed name of full_path or name without any checks |
Method | _name |
return name of full_path , idempotent. |
Method | _url_ |
return value of _url_ entry in definition_list file or None . |
Instance Variable | _all |
Undocumented |
Instance Variable | _all |
Undocumented |
Instance Variable | _checked |
Undocumented |
Instance Variable | _names |
Undocumented |
Instance Variable | _print |
Undocumented |
Instance Variable | _redownload |
Undocumented |
Inherited from StrList
:
Method | __call__ |
alias to find |
Method | find |
return entries that match all substrs . |
Method | find |
same as find but returns indices instead of names |
Method | print |
print the result of find(*substrs) with indices. |
Property | as |
return space separated string concatenation surrounded by spaces. |
Property | found |
StrList of elements found during the last call to find . |
cocopp.toolsdivers.StrList.__init__
Argument is a local path to the archive.
This class is not anymore meant to be used directly, rather use
cocopp.archiving.get
.
local_path
is an archive folder containing a definition file,
possibly downloaded with get
calling _get_remote
from a given url
.
~ may refer to the user home folder.
Set _all
and self
from _all
without _url_`
entry.
This init does not deal with remote logic, it only reads in _url_ from
the definition file into the remote_data_path
attribute.
Details: Set _all_dict
which is a (never used) dictionary
generated from _all
and self
and consists of the keys except
for '_url_'.
raise Exception when hashes disagree or file is missing.
raise RunTimeError if hash is unknown raise ValueError if hashes disagree
basic quick consistency check of downloaded data.
return (number_of_checked_data, number_of_all_data)
return the full data pathname of substr
in the archived data.
Retrieves the data from remote if necessary.
substr
can be a substring that matches one and only one name in
the data archive or an integer between 0 and len(self)
, see
find
or cocopp.archiving.OfficialArchives
for how matching is
determined.
Raises a ValueError
if substr
matches several archive entries
on none.
If substr is None (default), the first match of the last
call to find* or get* is used like self.found[0]`
.
If remote is True (default), the respective data are
downloaded from the remote location if necessary. Otherwise
return None
for a match.
Return a list
(StrList
) of absolute pathnames,
by repeatedly calling get
. Elements of the indices
list can
be an index or a (sub)string that matches one or several names
in the archive. If indices is None, the results from the
last call to find
are used. Data are downloaded if necessary.
See find
or cocopp.archiving.OfficialArchives
for how matching
is determined.
See also get
, get_extended
.
return a list of valid paths.
Elements in args
may be a valid path name or a known name from the
data archive, or a uniquely matching substring of such a name, or a
matching substring with added "!" in which case the first match is taken
only (calling self.get_first
), or a matching substring with added "*"
in which case all matches are taken (calling self.get_all
), or a
regular expression containing a *
and not ending with !
or *
, in
which case, for example, "bbob/2017.*cma" matches
"bbob/2017/DTS-CMA-ES-Pitra.tgz" among others (in a regular expression
"." matches any single character and ".*" matches any number >= 0 of
characters).
get the first archived data matching all of substrs
.
substrs
is a list of substrings.
get_first(substrs, remote)
is a shortcut for:
self.find(*substrs) if self.found: return self.get(self.found[0], remote=remote) return None
update definition file, either from remote location or from local data.
As remote archives may grow or change, a common usecase may be
>>> import cocopp.archiving as ac >>> url = 'https://cma-es.github.io/lq-cma/data-archives/lq-gecco2019' >>> arch = ac.get(url).update() # doctest:+SKIP
For updating a local archive use:
create(self.local_data_path)
Details: for updating the local definition file from the local data
rather use create
. This will however remove a remote URL from its
definition and the remote and the local archive can be different
now. create
makes a backup of the existing definition file.
return list
of data set names of locally available data.
This is only meaningful for a remote archive.
return name of full_path
, idempotent.
If full_path
is not from the data archive a warning is issued
and path seperators are replaced with /
.
Check that all names are only once in the data archive:
>>> import cocopp >>> bbob = cocopp.archives.bbob >>> for name in bbob: ... assert bbob.count(name) == 1, "%s counted %d times in data archive" % (name, bbob.count(name)) ... assert len(bbob.find(name)) == 1, "%s found %d times" % (name, bbob.find(name))