<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>testing &amp;mdash; Nat Knight</title>
    <link>http://natknight.xyz/tag:testing</link>
    <description>Reflections, diversions, and opinions from a progressive ex-physicist programmer dad with a sore back.</description>
    <pubDate>Sat, 23 May 2026 14:43:54 -0700</pubDate>
    <item>
      <title>Consistent Random UUIDs in Python</title>
      <link>http://natknight.xyz/consistent-random-uuids-in-python</link>
      <description>&lt;![CDATA[#python #testing&#xA;&#xA;When I&#39;m doing data analysis or building applications with Python and I have to give entities a unique ID, I like to use random UUIDs instead of sequential numbers. Sequential numbers include information about the order and total number of data, but I want my IDs to be just a unique identifier, nothing more.&#xA;&#xA;[uuid-def]: https://en.wikipedia.org/wiki/Universallyuniqueidentifier&#xA;&#xA;!--more--&#xA;&#xA;Python&#39;s standard library includes the uuid module, for working with UUIDs. There&#39;s a convenient function for generating random ones:&#xA;&#xA;[uuid-module]: https://docs.python.org/3.7/library/uuid.html&#xA;&#xA;      import uuid&#xA;      uuid.uuid4()&#xA;UUID(&#39;189afb2c-1d58-4390-b35e-d5c0e3bb7472&#39;)&#xA;      uuid.uuid4()&#xA;UUID(&#39;0fde2d22-1918-4e39-8c6c-825c2655cbd5&#39;)&#xA;&#xA;Handy. 🙂&#xA;&#xA;I like my UUIDs random, but it can be useful to have them be consistent between runs. That way, you can re-run data processing scripts but keep the same UUIDs. This is a little trickier than I thought it would be, but it&#39;s certainly possible.&#xA;&#xA;My first instinct was to set the random seed in the random module, which is usually enough to make the random number generator behave the same way with each run:&#xA;&#xA;[random-module]: https://docs.python.org/3.7/library/random.html&#xA;&#xA;      import random&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      [random.randint(0, 100) for  in range(12)]&#xA;[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      [random.randint(0, 100) for  in range(24)]&#xA;[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]&#xA;&#xA;Setting the seed makes random.randint give us consistent results. Maybe it will work for uuid.uuid4 as well.&#xA;&#xA;      import uuid&#xA;      import random&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      uuid.uuid4()&#xA;UUID(&#39;37b88a25-9f0f-4308-9532-84fd9a924c06&#39;)&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      uuid.uuid4()&#xA;UUID(&#39;04961188-33db-4ad9-86da-e9fcfc6a22e1&#39;)&#xA;&#xA;Huh. That didn&#39;t work. What gives? 🤔&#xA;&#xA;Let&#39;s look at the source code for uuid.uuid4:&#xA;&#xA;def uuid4():&#xA;    &#34;&#34;&#34;Generate a random UUID.&#34;&#34;&#34;&#xA;    return UUID(bytes=os.urandom(16), version=4)&#xA;&#xA;We can see that it&#39;s using os.urandom instead of the random module. That function goes straight to the operating system&#39;s random number generator, which isn&#39;t affected by random.seed. That&#39;s definitely a good thing! The random module is a pseudo random number generator, and using it in some kinds of application could cause security vulnerabilities, so for those applications, os.urandom is the right choice.&#xA;&#xA;[os-urandom]: https://docs.python.org/3.7/library/os.html#os.urandom&#xA;&#xA;However, it&#39;s not what I want for my data analysis application, so how can we make it use a pseudo-random generator instead?&#xA;&#xA;We can see that uuid.uuid4 is making uuid.UUID objects. If we can provide our own, pseudo-random bytes, we can generate pseudo-random UUIDs instead.&#xA;&#xA;[uuid-class]: https://docs.python.org/3.7/library/uuid.html#uuid.UUID&#xA;&#xA;uuid.UUID wants a bytes object, which we can make from a sequence of integers, like this:&#xA;&#xA;[bytes-obj]: https://docs.python.org/3/library/stdtypes.html#bytes&#xA;&#xA;      integers = [1, 2, 4, 8, 16]&#xA;      bytes(integers)&#xA;b&#39;\x01\x02\x04\x08\x10&#39;&#xA;&#xA;We can get a sequence of random, 8-bit integers (i.e. bytes) from random using the random.getrandbits function&#xA;&#xA;[getrandbits]: https://docs.python.org/3.7/library/random.html#random.getrandbits&#xA;&#xA;Putting it all together, we get something like this:&#xA;&#xA;      import random&#xA;      import uuid&#xA;      def randomuuid():&#xA;...     return uuid.UUID(bytes=bytes(random.getrandbits(8) for  in range(16)), version=4)&#xA;...&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      randomuuid()&#xA;UUID(&#39;dad39ff6-a734-4906-8804-182dda97441f&#39;)&#xA;      random.seed(&#34;peanutbutter&#34;)&#xA;      randomuuid()&#xA;UUID(&#39;dad39ff6-a734-4906-8804-182dda97441f&#39;)&#xA;&#xA;Success! 🎉&#xA;&#xA;That&#39;s how to generate consistent (pseudo) random UUIDs with Python&#39;s standard library.&#xA;&#xA;One final note: the code above uses the shared random number generator in the random module, so if you need independent sequences of random UUIDs (e.g. for running isolated tests in parallel) it might be better to use separate random number generators for each sequence. I&#39;ve written up a sample implementation of how to do that, which is available here.&#xA;&#xA;[uuid-gen-snippet]: https://bitbucket.org/snippets/nathanielknight/aez955/consistent-isolated-random-uuid-ngenerator&#xA;&#xA;]]&gt;</description>
      <content:encoded><![CDATA[<p><a href="http://natknight.xyz/tag:python" class="hashtag"><span>#</span><span class="p-category">python</span></a> <a href="http://natknight.xyz/tag:testing" class="hashtag"><span>#</span><span class="p-category">testing</span></a></p>

<p>When I&#39;m doing data analysis or building applications with Python and I have to give entities a unique ID, I like to use <a href="https://en.wikipedia.org/wiki/Universally_unique_identifier">random UUIDs</a> instead of sequential numbers. Sequential numbers include information about the order and total number of data, but I want my IDs to be just a unique identifier, nothing more.</p>



<p>Python&#39;s standard library includes the <a href="https://docs.python.org/3.7/library/uuid.html"><code>uuid</code> module</a>, for working with UUIDs. There&#39;s a convenient function for generating random ones:</p>

<pre><code class="language-python">&gt;&gt;&gt; import uuid
&gt;&gt;&gt; uuid.uuid4()
UUID(&#39;189afb2c-1d58-4390-b35e-d5c0e3bb7472&#39;)
&gt;&gt;&gt; uuid.uuid4()
UUID(&#39;0fde2d22-1918-4e39-8c6c-825c2655cbd5&#39;)
</code></pre>

<p>Handy. 🙂</p>

<p>I like my UUIDs random, but it can be useful to have them be consistent between runs. That way, you can re-run data processing scripts but keep the same UUIDs. This is a little trickier than I thought it would be, but it&#39;s certainly possible.</p>

<p>My first instinct was to set the random seed in the <a href="https://docs.python.org/3.7/library/random.html"><code>random</code> module</a>, which is usually enough to make the random number generator behave the same way with each run:</p>

<pre><code class="language-python">&gt;&gt;&gt; import random
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; [random.randint(0, 100) for _ in range(12)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; [random.randint(0, 100) for _ in range(24)]
[79, 83, 26, 76, 3, 4, 2, 12, 22, 75, 34, 15]
</code></pre>

<p>Setting the seed makes <code>random.randint</code> give us consistent results. Maybe it will work for <code>uuid.uuid4</code> as well.</p>

<pre><code class="language-python">&gt;&gt;&gt; import uuid
&gt;&gt;&gt; import random
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; uuid.uuid4()
UUID(&#39;37b88a25-9f0f-4308-9532-84fd9a924c06&#39;)
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; uuid.uuid4()
UUID(&#39;04961188-33db-4ad9-86da-e9fcfc6a22e1&#39;)
</code></pre>

<p>Huh. That didn&#39;t work. What gives? 🤔</p>

<p>Let&#39;s look at the source code for <code>uuid.uuid4</code>:</p>

<pre><code class="language-python">def uuid4():
    &#34;&#34;&#34;Generate a random UUID.&#34;&#34;&#34;
    return UUID(bytes=os.urandom(16), version=4)
</code></pre>

<p>We can see that it&#39;s using <a href="https://docs.python.org/3.7/library/os.html#os.urandom"><code>os.urandom</code></a> instead of the <code>random</code> module. That function goes straight to the operating system&#39;s random number generator, which isn&#39;t affected by <code>random.seed</code>. That&#39;s definitely a good thing! The <code>random</code> module is a pseudo random number generator, and using it in some kinds of application could cause security vulnerabilities, so for those applications, <code>os.urandom</code> is the right choice.</p>

<p>However, it&#39;s not what I want for my data analysis application, so how can we make it use a pseudo-random generator instead?</p>

<p>We can see that <code>uuid.uuid4</code> is making <a href="https://docs.python.org/3.7/library/uuid.html#uuid.UUID"><code>uuid.UUID</code></a> objects. If we can provide our own, pseudo-random bytes, we can generate pseudo-random UUIDs instead.</p>

<p><code>uuid.UUID</code> wants a <a href="https://docs.python.org/3/library/stdtypes.html#bytes"><code>bytes</code> object</a>, which we can make from a sequence of integers, like this:</p>

<pre><code class="language-python">&gt;&gt;&gt; integers = [1, 2, 4, 8, 16]
&gt;&gt;&gt; bytes(integers)
b&#39;\x01\x02\x04\x08\x10&#39;
</code></pre>

<p>We can get a sequence of random, 8-bit integers (i.e. bytes) from <code>random</code> using the <a href="https://docs.python.org/3.7/library/random.html#random.getrandbits"><code>random.getrandbits</code> function</a></p>

<p>Putting it all together, we get something like this:</p>

<pre><code class="language-python">&gt;&gt;&gt; import random
&gt;&gt;&gt; import uuid
&gt;&gt;&gt; def random_uuid():
...     return uuid.UUID(bytes=bytes(random.getrandbits(8) for _ in range(16)), version=4)
...
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; random_uuid()
UUID(&#39;dad39ff6-a734-4906-8804-182dda97441f&#39;)
&gt;&gt;&gt; random.seed(&#34;peanutbutter&#34;)
&gt;&gt;&gt; random_uuid()
UUID(&#39;dad39ff6-a734-4906-8804-182dda97441f&#39;)
</code></pre>

<p>Success! 🎉</p>

<p>That&#39;s how to generate consistent (pseudo) random UUIDs with Python&#39;s standard library.</p>

<p>One final note: the code above uses the shared random number generator in the <code>random</code> module, so if you need independent sequences of random UUIDs (e.g. for running isolated tests in parallel) it might be better to use separate random number generators for each sequence. I&#39;ve written up a sample implementation of how to do that, which is available <a href="https://bitbucket.org/snippets/nathanielknight/aez955/consistent-isolated-random-uuid-ngenerator">here</a>.</p>
]]></content:encoded>
      <guid>http://natknight.xyz/consistent-random-uuids-in-python</guid>
      <pubDate>Wed, 14 Nov 2018 08:00:00 +0000</pubDate>
    </item>
  </channel>
</rss>