<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>data &amp;mdash; Nat Knight</title>
    <link>http://natknight.xyz/tag:data</link>
    <description>Reflections, diversions, and opinions from a progressive ex-physicist programmer dad with a sore back.</description>
    <pubDate>Fri, 01 May 2026 00:00:01 -0700</pubDate>
    <item>
      <title>Lazy-loading data modules in Python with one magic function</title>
      <link>http://natknight.xyz/lazy-loading-data-modules-in-python-with-one-magic-function</link>
      <description>&lt;![CDATA[#python #data&#xA;&#xA;Greg Wilson was soliciting strategies for lazy-loading datasets in Python modules. There are, of course, many ways to do this, but I didn&#39;t see this one being discussed.&#xA;&#xA;Since Python 3.7 (released in 2017) you can define a module-level getattr function that gets used for name resolution (see PEP 562). You can use this to call a data-loading function the first time each module-level property is accessed. This implementation uses only functions, dicts, and the one magic &#34;dunder&#34; function, which might be more approachable than programming with classes (depending on your audience).&#xA;&#xA;!--more--&#xA;&#xA;A sketch of an implementation:&#xA;&#xA;Cache loaded datasets&#xA;datasetcache = {}&#xA;&#xA;Dictionary mapping dataset names to loader functions&#xA;datasetloaders = {&#xA;  # maps names to argument-less functions that load datasets&#xA;}&#xA;&#xA;def getattr(name):&#xA;    &#34;&#34;&#34;PEP 562 module-level getattr function for lazy loading&#34;&#34;&#34;&#xA;    # Check if we should load a dataset for this attribute&#xA;    if name in datasetloaders:&#xA;        # If not already cached, load and cache it&#xA;        if name not in datasetcache:&#xA;            datasetcachename] = datasetloaders[name&#xA;        return datasetcache[name]&#xA;&#xA;    # If not a dataset, raise AttributeError&#xA;    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)&#xA;&#xA;For example, this module has attributes foo and bar which are only calculated when they&#39;re first referenced:&#xA;&#xA;foo.py&#xA;&#xA;Cache loaded datasets&#xA;datasetcache = {}&#xA;&#xA;def loaddata(name):&#xA;    if name == &#34;foo&#34;:&#xA;        print(&#34;loading foo&#34;)&#xA;        return 1&#xA;    elif name == &#34;bar&#34;:&#xA;        print(&#34;loading bar&#34;)&#xA;        return 2&#xA;    else:&#xA;        # This should be unreachable, but just in case...&#xA;        raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)&#xA;&#xA;Dictionary mapping dataset names to loader functions&#xA;datasetloaders = {&#xA;    &#34;foo&#34;: lambda: loaddata(&#34;foo&#34;),&#xA;    &#34;bar&#34;: lambda: loaddata(&#34;bar&#34;),&#xA;}&#xA;&#xA;def getattr(name):&#xA;    &#34;&#34;&#34;PEP 562 module-level getattr function for lazy loading&#34;&#34;&#34;&#xA;    # Check if we should load a dataset for this attribute&#xA;    if name in datasetloaders:&#xA;        # If not already cached, load and cache it&#xA;        if name not in datasetcache:&#xA;            datasetcachename] = datasetloaders[name&#xA;        return dataset_cache[name]&#xA;&#xA;    # If not a dataset, raise AttributeError&#xA;    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)&#xA;&#xA;In use, this looks like:&#xA;&#xA;      import foo&#xA;      foo.foo&#xA;loading foo&#xA;1&#xA;      foo.foo&#xA;1&#xA;      # NOTE: didn&#39;t load foo a second time&#xA;      foo.bar&#xA;loading bar&#xA;2&#xA;      foo.quuz&#xA;Traceback (most recent call last):&#xA;  File &#34;python-input-4&#34;, line 1, in module&#xA;    foo.quuz&#xA;  File &#34;/Users/nknight/tmp/foo.py&#34;, line 34, in getattr&#xA;    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)&#xA;AttributeError: Module has no attribute &#39;quuz&#39;&#xA;`]]&gt;</description>
      <content:encoded><![CDATA[<p><a href="http://natknight.xyz/tag:python" class="hashtag"><span>#</span><span class="p-category">python</span></a> <a href="http://natknight.xyz/tag:data" class="hashtag"><span>#</span><span class="p-category">data</span></a></p>

<p>Greg Wilson was <a href="https://third-bit.com/2025/04/21/lazy-loading-data-package/">soliciting</a> strategies for lazy-loading datasets in Python modules. There are, of course, many ways to do this, but I didn&#39;t see this one being discussed.</p>

<p>Since Python 3.7 (released in 2017) you can define a module-level <code>__getattr__</code> function that gets used for name resolution (see <a href="https://peps.python.org/pep-0562/">PEP 562</a>). You can use this to call a data-loading function the first time each module-level property is accessed. This implementation uses only functions, dicts, and the one magic “dunder” function, which might be more approachable than programming with classes (depending on your audience).</p>



<p>A sketch of an implementation:</p>

<pre><code class="language-python"># Cache loaded datasets
_dataset_cache = {}

# Dictionary mapping dataset names to loader functions
_dataset_loaders = {
  # maps names to argument-less functions that load datasets
}


def __getattr__(name):
    &#34;&#34;&#34;PEP 562 module-level __getattr__ function for lazy loading&#34;&#34;&#34;
    # Check if we should load a dataset for this attribute
    if name in _dataset_loaders:
        # If not already cached, load and cache it
        if name not in _dataset_cache:
            _dataset_cache[name] = _dataset_loaders[name]()
        return _dataset_cache[name]

    # If not a dataset, raise AttributeError
    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)
</code></pre>

<p>For example, this module has attributes <code>foo</code> and <code>bar</code> which are only calculated when they&#39;re first referenced:</p>

<pre><code class="language-python"># foo.py

# Cache loaded datasets
_dataset_cache = {}


def load_data(name):
    if name == &#34;foo&#34;:
        print(&#34;loading foo&#34;)
        return 1
    elif name == &#34;bar&#34;:
        print(&#34;loading bar&#34;)
        return 2
    else:
        # This should be unreachable, but just in case...
        raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)


# Dictionary mapping dataset names to loader functions
_dataset_loaders = {
    &#34;foo&#34;: lambda: load_data(&#34;foo&#34;),
    &#34;bar&#34;: lambda: load_data(&#34;bar&#34;),
}


def __getattr__(name):
    &#34;&#34;&#34;PEP 562 module-level __getattr__ function for lazy loading&#34;&#34;&#34;
    # Check if we should load a dataset for this attribute
    if name in _dataset_loaders:
        # If not already cached, load and cache it
        if name not in _dataset_cache:
            _dataset_cache[name] = _dataset_loaders[name]()
        return _dataset_cache[name]

    # If not a dataset, raise AttributeError
    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)
</code></pre>

<p>In use, this looks like:</p>

<pre><code>&gt;&gt;&gt; import foo
&gt;&gt;&gt; foo.foo
loading foo
1
&gt;&gt;&gt; foo.foo
1
&gt;&gt;&gt; # NOTE: didn&#39;t load foo a second time
&gt;&gt;&gt; foo.bar
loading bar
2
&gt;&gt;&gt; foo.quuz
Traceback (most recent call last):
  File &#34;&lt;python-input-4&gt;&#34;, line 1, in &lt;module&gt;
    foo.quuz
  File &#34;/Users/nknight/tmp/foo.py&#34;, line 34, in __getattr__
    raise AttributeError(f&#34;Module has no attribute &#39;{name}&#39;&#34;)
AttributeError: Module has no attribute &#39;quuz&#39;
</code></pre>
]]></content:encoded>
      <guid>http://natknight.xyz/lazy-loading-data-modules-in-python-with-one-magic-function</guid>
      <pubDate>Tue, 22 Apr 2025 17:49:33 +0000</pubDate>
    </item>
  </channel>
</rss>