<?xml version="1.0" encoding="UTF-8"?><rss version="2.0" xmlns:content="http://purl.org/rss/1.0/modules/content/">
  <channel>
    <title>observability &amp;mdash; Nat Knight</title>
    <link>http://natknight.xyz/tag:observability</link>
    <description>Reflections, diversions, and opinions from a progressive ex-physicist programmer dad with a sore back.</description>
    <pubDate>Sun, 24 May 2026 16:23:34 -0700</pubDate>
    <item>
      <title>Is OpenTelemetry Excessive?</title>
      <link>http://natknight.xyz/is-opentelemetry-excessive</link>
      <description>&lt;![CDATA[#opentelemetry #observability&#xA;&#xA;This article is a brief account of my experience setting up, operating, and using [Open Telemetry] on a very small software development project wherein I reach the surprising conclusion that it&#39;s probably worthwhile much earlier and at much smaller scales than you might expect.&#xA;&#xA;!--more--&#xA;&#xA;The project in question was the back end for a proof-of-concept mobile app that I worked on as part of my day job. This wasn&#39;t even a Minimum Viable Product, more of an experiment to demonstrate what an MVP might look like.  When I adopted Open Telemetry I was worried that it might be adding needless complexity and overhead to a very basic app, but to my surprise and delight it paid for itself several times over.&#xA;&#xA;[Open Telemetry]: https://opentelemetry.io/&#xA;&#xA;Open Telemetry&#xA;&#xA;Open Telemetry describes itself as&#xA;&#xA;  High-quality, ubiquitous, and portable telemetry to enable effective observability &#xA;&#xA;It&#39;s pitched as a tool for tackling enterprise-grade-highly-distributed-microservice-enabled complexity–the sort of thing that Charity, Liz, and Jessica talk about on the&#xA;O11ycast.&#xA;&#xA;Concretely, it&#39;s a set of standards for&#xA;&#xA;adding diagnostic events to an application (called &#34;instrumenting&#34;)&#xA;filtering, transforming, and delivering those events to a variety of back ends&#xA;&#xA;as well as&#xA;&#xA;open-source libraries implementing those standards for various programming languages and runtimes, databases, etc.&#xA;open-source and proprietary tools for collecting and analyzing the diagnostic events your application is producing&#xA;&#xA;Once you&#39;ve set it up, you can turn on [&#34;auto-instrumentation&#34;] for common software components, which ended up being very valuable.&#xA;&#xA;[&#34;auto-instrumentation&#34;]: https://www.npmjs.com/package/@opentelemetry/auto-instrumentations-node&#xA;&#xA;What I put into it&#xA;&#xA;Unfortunately, it&#39;s not all good news: setting up Open Telemetry was more work than I was expecting. The NodeJS libraries are complex (and seem to be in a state of flux?). There&#39;s a lot of configuration and setup. The library&#39;s interface is also more complicated (and quite a bit more powerful) than console.(log|info|error|debug), which is what I would usually be doing. This all took work and precious time to learn.&#xA;&#xA;I ended up sending logs to stdout as nicely formatted JSON. More sophisticated setups are available, but this 12-factor sort of approach served me well in development (Docker Compose, where I could inspect the logs with docker-compose logs) and in production (SystemD services on EC2, where I used journalctl).&#xA;&#xA;What I got out of it&#xA;&#xA;Once I got the SDK configured properly and wrapped my head around how to use it I was able to instrument my own code, which was valuable as expected.  What I wasn&#39;t expecting was the comprehensive auto-instrumentation for things like NodeJS&#39;s HTTP stack and PostGRES client.&#xA;&#xA;This let me inspect the details of:&#xA;&#xA;every HTTP request that came in to my app&#xA;every HTTP request it sent to third-party services&#xA;the content, parameters, and timing of every database query&#xA;uncaught exceptions&#xA;&#xA;This helped me catch and fix:&#xA;&#xA;several minor-but-subtle bugs and misconfigurations in my own code&#xA;request parameter mismatches coming from the mobile app&#xA;a catastrophic bug in my auth middleware&#xA;problems in the SDKs for third-party services (I have no idea how I would&#xA;  have caught these without detailed HTTP tracing)&#xA;&#xA;These were bugs that slipped past a decent test suite and TypeScript annotations, and I diagnosed them  without modifying my app. That&#39;s the promise of observability: you can&#39;t predict what you should be recording but if you&#39;re disciplined and systematic about instrumenting your code you&#39;ll be able to figure everything out when you discover what you need.&#xA;&#xA;This seemed like common sense for big complicated distributed systems, but I might be starting to believe it for small straightforward greenfield projects as well.&#xA;]]&gt;</description>
      <content:encoded><![CDATA[<p><a href="http://natknight.xyz/tag:opentelemetry" class="hashtag"><span>#</span><span class="p-category">opentelemetry</span></a> <a href="http://natknight.xyz/tag:observability" class="hashtag"><span>#</span><span class="p-category">observability</span></a></p>

<p>This article is a brief account of my experience setting up, operating, and using <a href="https://opentelemetry.io/">Open Telemetry</a> on a very small software development project wherein I reach the surprising conclusion that it&#39;s probably worthwhile much earlier and at much smaller scales than you might expect.</p>



<p>The project in question was the back end for a proof-of-concept mobile app that I worked on as part of my day job. This wasn&#39;t even a Minimum Viable Product, more of an experiment to demonstrate what an MVP might look like.  When I adopted Open Telemetry I was worried that it might be adding needless complexity and overhead to a very basic app, but to my surprise and delight it paid for itself several times over.</p>

<h2 id="open-telemetry" id="open-telemetry">Open Telemetry</h2>

<p>Open Telemetry describes itself as</p>

<blockquote><p>High-quality, ubiquitous, and portable telemetry to enable effective observability</p></blockquote>

<p>It&#39;s pitched as a tool for tackling enterprise-grade-highly-distributed-microservice-enabled complexity–the sort of thing that Charity, Liz, and Jessica talk about on the
<a href="https://www.heavybit.com/library/podcasts/o11ycast">O11ycast</a>.</p>

<p>Concretely, it&#39;s a set of standards for</p>
<ul><li>adding diagnostic events to an application (called “instrumenting”)</li>
<li>filtering, transforming, and delivering those events to a variety of back ends</li></ul>

<p>as well as</p>
<ul><li>open-source libraries implementing those standards for various programming languages and runtimes, databases, etc.</li>
<li>open-source and proprietary tools for collecting and analyzing the diagnostic events your application is producing</li></ul>

<p>Once you&#39;ve set it up, you can turn on <a href="https://www.npmjs.com/package/@opentelemetry/auto-instrumentations-node">“auto-instrumentation”</a> for common software components, which ended up being very valuable.</p>

<h2 id="what-i-put-into-it" id="what-i-put-into-it">What I put into it</h2>

<p>Unfortunately, it&#39;s not all good news: setting up Open Telemetry was more work than I was expecting. The NodeJS libraries are complex (and seem to be in a state of flux?). There&#39;s a lot of configuration and setup. The library&#39;s interface is also more complicated (and quite a bit more powerful) than <code>console.(log|info|error|debug)</code>, which is what I would usually be doing. This all took work and precious time to learn.</p>

<p>I ended up sending logs to <code>stdout</code> as nicely formatted JSON. More sophisticated setups are available, but this <a href="https://12factor.net/">12-factor</a> sort of approach served me well in development (Docker Compose, where I could inspect the logs with <code>docker-compose logs</code>) and in production (SystemD services on EC2, where I used <code>journalctl</code>).</p>

<h2 id="what-i-got-out-of-it" id="what-i-got-out-of-it">What I got out of it</h2>

<p>Once I got the SDK configured properly and wrapped my head around how to use it I was able to instrument my own code, which was valuable as expected.  What I wasn&#39;t expecting was the comprehensive auto-instrumentation for things like NodeJS&#39;s HTTP stack and PostGRES client.</p>

<p>This let me inspect the details of:</p>
<ul><li>every HTTP request that came in to my app</li>
<li>every HTTP request it sent to third-party services</li>
<li>the content, parameters, and timing of every database query</li>
<li>uncaught exceptions</li></ul>

<p>This helped me catch and fix:</p>
<ul><li>several minor-but-subtle bugs and misconfigurations in my own code</li>
<li>request parameter mismatches coming from the mobile app</li>
<li>a catastrophic bug in my auth middleware</li>
<li>problems in the SDKs for third-party services (I have <em>no idea</em> how I would
have caught these without detailed HTTP tracing)</li></ul>

<p>These were bugs that slipped past a decent test suite and TypeScript annotations, and I diagnosed them  without modifying my app. That&#39;s the promise of observability: you can&#39;t predict what you should be recording but if you&#39;re disciplined and systematic about instrumenting your code you&#39;ll be able to figure everything out when you discover what you need.</p>

<p>This seemed like common sense for big complicated distributed systems, but I might be starting to believe it for small straightforward greenfield projects as well.</p>
]]></content:encoded>
      <guid>http://natknight.xyz/is-opentelemetry-excessive</guid>
      <pubDate>Sun, 27 Nov 2022 08:00:00 +0000</pubDate>
    </item>
  </channel>
</rss>