Lazy-loading data modules in Python with one magic function
Greg Wilson was soliciting strategies for lazy-loading datasets in Python modules. There are, of course, many ways to do this, but I didn't see this one being discussed.
Since Python 3.7 (released in 2017) you can define a module-level __getattr__
function that gets used for name resolution (see PEP 562). You can use this to call a data-loading function the first time each module-level property is accessed. This implementation uses only functions, dicts, and the one magic “dunder” function, which might be more approachable than programming with classes (depending on your audience).
A sketch of an implementation:
# Cache loaded datasets
_dataset_cache = {}
# Dictionary mapping dataset names to loader functions
_dataset_loaders = {
# maps names to argument-less functions that load datasets
}
def __getattr__(name):
"""PEP 562 module-level __getattr__ function for lazy loading"""
# Check if we should load a dataset for this attribute
if name in _dataset_loaders:
# If not already cached, load and cache it
if name not in _dataset_cache:
_dataset_cache[name] = _dataset_loaders[name]()
return _dataset_cache[name]
# If not a dataset, raise AttributeError
raise AttributeError(f"Module has no attribute '{name}'")
For example, this module has attributes foo
and bar
which are only calculated when they're first referenced:
# foo.py
# Cache loaded datasets
_dataset_cache = {}
def load_data(name):
if name == "foo":
print("loading foo")
return 1
elif name == "bar":
print("loading bar")
return 2
else:
# This should be unreachable, but just in case...
raise AttributeError(f"Module has no attribute '{name}'")
# Dictionary mapping dataset names to loader functions
_dataset_loaders = {
"foo": lambda: load_data("foo"),
"bar": lambda: load_data("bar"),
}
def __getattr__(name):
"""PEP 562 module-level __getattr__ function for lazy loading"""
# Check if we should load a dataset for this attribute
if name in _dataset_loaders:
# If not already cached, load and cache it
if name not in _dataset_cache:
_dataset_cache[name] = _dataset_loaders[name]()
return _dataset_cache[name]
# If not a dataset, raise AttributeError
raise AttributeError(f"Module has no attribute '{name}'")
In use, this looks like:
>>> import foo
>>> foo.foo
loading foo
1
>>> foo.foo
1
>>> # NOTE: didn't load foo a second time
>>> foo.bar
loading bar
2
>>> foo.quuz
Traceback (most recent call last):
File "<python-input-4>", line 1, in <module>
foo.quuz
File "/Users/nknight/tmp/foo.py", line 34, in __getattr__
raise AttributeError(f"Module has no attribute '{name}'")
AttributeError: Module has no attribute 'quuz'