<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Observability Real Talk]]></title><description><![CDATA[Stories from hard-core observability nerds at SigNoz - spreading the word on open-source, observability, OpenTelemetry and behind-the-scenes of building a dev tool infra product.

]]></description><link>https://newsletter.signoz.io</link><image><url>https://newsletter.signoz.io/img/substack.png</url><title>Observability Real Talk</title><link>https://newsletter.signoz.io</link></image><generator>Substack</generator><lastBuildDate>Thu, 28 May 2026 16:03:56 GMT</lastBuildDate><atom:link href="https://newsletter.signoz.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[SigNoz]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[signoz@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[signoz@substack.com]]></itunes:email><itunes:name><![CDATA[SigNoz]]></itunes:name></itunes:owner><itunes:author><![CDATA[SigNoz]]></itunes:author><googleplay:owner><![CDATA[signoz@substack.com]]></googleplay:owner><googleplay:email><![CDATA[signoz@substack.com]]></googleplay:email><googleplay:author><![CDATA[SigNoz]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Your Observability Tool just got a Second User]]></title><description><![CDATA[&#8220;It&#8217;s almost like I have a full-time person whose job is making sure my stack is always green&#8221;]]></description><link>https://newsletter.signoz.io/p/your-observability-tool-just-got</link><guid isPermaLink="false">https://newsletter.signoz.io/p/your-observability-tool-just-got</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 24 May 2026 13:44:50 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!B2PC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This blog took 3 days and 12 hours to be curated, so make sure to show some love!</em></p></blockquote><p>A few weeks ago, <strong><a href="https://www.linkedin.com/in/leo-blondel-963435202/">Leo Blondel</a></strong> stopped opening his SigNoz dashboards.</p><p>Not because they were bad, they weren&#8217;t. He&#8217;d set them up himself, spent a weekend tuning them, and he says he liked them, but he just stopped <em>needing to look</em>. &#8220;I have dashboards because I set them up at the beginning,&#8221; he told us. &#8220;I haven&#8217;t opened them except for screenshots for my ISO certification.&#8221; This is not a story about dashboards being obsolete; plenty of teams will still want them. It&#8217;s a story about what happens when something else starts reading your telemetry patiently, in parallel, while you&#8217;re trying to get your fourteen-month-old back to sleep, as Leo puts it. </p><p>For a long time, observability tools have been built for one user: a human at a screen, scrolling logs, hovering over a flame graph, deciding what to do next. But there is now a second user in the building, and they are <em>strange</em>. They no longer read dashboards; they read APIs and JSON instead, open a dozen context windows at once and most importantly, they never get tired.</p><p>They are, in the way that matters for an SRE on call, infinitely patient.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!B2PC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!B2PC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!B2PC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:276641,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/jpeg&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/199068970?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!B2PC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 424w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 848w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 1272w, https://substackcdn.com/image/fetch/$s_!B2PC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F8e02e2b2-333f-4db8-b3cf-0c0639e5a5ea_2937x1652.jpeg 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>What follows is a look at what one engineer built for that second user, why it worked, and the parts that still belong to the first.</p><h2>What Leo built</h2><p>As CTO of a small startup, Leo is one of just three engineers on the team. Their product is the kind of microservice-heavy data plumbing that demands real observability and absolutely cannot justify a dedicated on-call rotation. He started, like a lot of people do, by trying Datadog. The two-week trial ended with a quote north of two thousand dollars.</p><p>Their AI SRE feature (Bits AI SRE), which lets you click a button on an error to have an agent investigate, was quoted at roughly $40 per investigation.</p><p>&#8220;You wouldn&#8217;t do it, right?&#8221; he said. &#8220;So instead, what that inspired me to do was: wait, I can do this.&#8221;</p><p>He moved to SigNoz, open source, openTelemetry-native, with a hosted MCP server that he could plug into Claude. Setup took, by his account, an afternoon, and then, instead of paying per-click for someone else&#8217;s investigation agent, he built his own and let it run.</p><p>The architecture is unromantic and worth describing in detail, because the details are where the difference lives:</p><ul><li><p><strong>Alerts originate in SigNoz.</strong> Rules are tuned by humans, when a rule fires, a webhook hits the agent.</p></li><li><p><strong>The agent is Claude Opus</strong>, running on a small VM, kept alive by PM2.</p></li><li><p><strong>It reads telemetry through the SigNoz MCP server,</strong> including logs, traces, metrics, the same data a human would query, surfaced through natural-language tool calls instead of a query bar.</p></li><li><p><strong>It has read-only access to Kubernetes</strong> through an RBAC-scoped service account. It can describe a pod, look at events, inspect an Argo CD application but cannot change anything.</p></li><li><p><strong>It has read access to GitLab.</strong> When something breaks, the first question it asks is whether anyone merged anything in the last hour.</p></li><li><p><strong>It talks to humans through Slack</strong> in socket mode, not just a one-way notification firehose but an actual conversation.</p></li><li><p><strong>It writes to a Postgres database</strong> as its working memory and diligently records every alert it has seen, every investigation it has done and every conclusion it has reached.</p></li><li><p><strong>It writes incident reports to Notion</strong> when something is resolved, so the post-mortem isn&#8217;t a task someone has to do later. It already exists.</p></li><li><p><strong>It restarts itself every day at 4 a.m.</strong> so its context window doesn&#8217;t fill up, reading a summary of its own memory on the way back up.</p></li></ul><p>One of the first things in Leo&#8217;s playbook for the agent is, in his words, a warning to itself: <em>&#8220;Your context window is your lifeline. If it fills up, you die and must restart.&#8221;</em> The agent is instructed to never investigate directly, but to spawn sub-agents with fresh context windows, smaller and cheaper models that do the dirty work and report a single paragraph back. The main agent stays clean, and the sub-agents remain disposable.</p><h2>What it actually does</h2><p>When a noisy alert fires, say, a node briefly reporting memory pressure that will be fine in ten minutes, the agent receives the webhook, queries SigNoz, decides it&#8217;s transient, and logs the event to its memory.</p><p>When the same alert fires for the eleventh time that week, the agent&#8217;s memory tells it this is no longer noise and is more of a pattern. So it pings Leo in Slack and explains itself: <em>here is the alert, here is how often I&#8217;ve seen it, here is what I think is going on, here is the merge request from yesterday that I suspect is the cause.</em> And Leo is free to reply after checking it out on his own, based on his availability. At the end of the day, the agent files what amounts to an end-of-shift report, including what they saw.</p><p>&#8220;It&#8217;s almost like I have a full-time person whose job is making sure my stack is always green&#8221;, Leo said.</p><p>It has been live for about three weeks, and in this time, by his count, it has produced zero false positives. It has had one false negative, which is one real thing it categorised as noise. He has been quietly extending it. Last week, he added Falco for security events, wrote a new rule into the agent&#8217;s playbook, and watched it pick up the alert and start reasoning about it.</p><h2>Context isn&#8217;t care</h2><p>The temptation, reading this, is to assume the agent is doing the human&#8217;s job; it isn&#8217;t. It&#8217;s doing a job that didn&#8217;t really exist before, or rather, that existed as the worst part of a different job, the part most folks hated.</p><p>There is a useful distinction here, one that SigNoz&#8217;s <strong><a href="https://signoz.io/blog/introducing-agent-native-observability/">own framing</a></strong> of agent-native observability gets right: <strong>agents bring context, humans bring care.</strong></p><p>An agent connected to your telemetry has more context than any human ever will. It has read every trace in the last hour, scanned every log line, and can correlate ten metrics across five services in the time it takes you to find the right tab. The classic complaint of observability that <em>I don&#8217;t have time to look at all of this</em> is, for the first time, not a problem anymore.</p><p>But we have to realise that context isn&#8217;t the same as care. The agent does not know which 500 actually matters. It does not know that one customer is on a renewal call this afternoon, or that the checkout service degrading by 80ms is a Series A milestone, or that the noisy memory alert on the staging cluster genuinely doesn&#8217;t matter because that cluster is being torn down on Friday. You can tell it some of these things, but you cannot tell it all of them, and the list keeps changing.</p><p>This is why every team we&#8217;ve watched try to hand alert judgment entirely to an agent has ended up in the same place. The agent produces noisy pages, and the team learns to ignore the pages, flagging them as a waste of time and effort. Within two weeks, the experiment will be uninstalled.</p><p>Leo&#8217;s setup doesn&#8217;t fall into that trap, since he kept the rules, including the actual alert thresholds, under human control and written deterministically in SigNoz. The agent&#8217;s job is downstream of that: triage what fires, investigate what looks real, escalate what matters. Human up top, deterministic system in the middle, agent at the bottom where the grunt work lives. None of those layers is doing the job of the one above it.</p><p>Or, the way he put it: &#8220;I don&#8217;t want the AI to replace something. I want the AI to augment.&#8221;</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to receive new posts.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What this asks of an observability platform</h2><p>If you take the second-user idea seriously that AI agents are going to be reading your telemetry the way humans read dashboards, start being product decisions.</p><p><strong>The data model has to be unified.</strong> When a human is investigating, they can paper over schema mismatches between logs, traces, and metrics by squinting, but an agent can&#8217;t. If your logs live in one schema and your traces in another and the correlation has to be reconstructed by hand each time, your agent will spend most of its tokens figuring out where it is instead of figuring out what&#8217;s wrong. This is one of the reasons SigNoz built on OpenTelemetry from day one and kept everything in a single store. Although it wasn&#8217;t an agent-era decision, it just turned out to be one.</p><p><strong>The interface has to speak to agents directly.</strong> Dashboards are a human-facing rendering of the underlying data, and agents don&#8217;t need that. They need the underlying data, exposed through something they can actually call, which, today, is synonymous with MCP. SigNoz&#8217;s hosted MCP server is live for Cloud users, and the self-hosted version is on GitHub. Leo&#8217;s whole setup runs through it.</p><p><strong>The platform has to be legible to a model</strong>. Models have been trained on years of OpenTelemetry tutorials, SigNoz documentation, GitHub issues, and Stack Overflow answers. They already know how to talk to an open, well-documented stack. They have a much harder time with proprietary query languages and gated docs, where the model has to reason from forum posts and Reddit threads.</p><p><strong>Powered with skills.</strong> A raw MCP server is a bag of capabilities. The next layer up is teaching agents the conventions of your team, which metrics matter, which tracing patterns hold up at scale, and what your debugging tribal knowledge actually looks like (more and more context). Your agent eventually actually works exactly as per your runbook.</p><h2>Two users, one platform</h2><p>The shape of agent-native observability, at least for the foreseeable future, is two users sharing the same underlying data and asking different things of it.</p><p>The human user wants a clear view of what&#8217;s happening, the ability to ask hard questions, and confidence that the system will tell them when something they care about is wrong. The agent user wants raw data, fast tool calls, clean schemas, and enough structure to reason without getting lost. The platform that serves both well is the one that stops treating dashboards as the product and starts treating <em>the underlying telemetry</em> as the product with multiple ways to consume it.</p><p>Leo&#8217;s setup is a small early example of what this looks like in practice. The rules live in SigNoz, configured by him. The agent picks up the work that scales poorly with human attention, like reading everything, correlating everything, remembering everything and gives back the work that scales well: deciding what matters, choosing what to monitor, knowing when something is actually wrong.</p><p>Both users belong on the platform now. We&#8217;re building for both.</p><div><hr></div><p><em>If you want to plug your own agent into your telemetry, the <a href="https://signoz.io/docs/ai/signoz-mcp-server/">SigNoz MCP server</a> is live for Cloud users and <a href="https://github.com/signoz/signoz-mcp-server">open source on GitHub</a> for everyone else. The longer picture, including the AI Assistant beta and the skills work coming next, lives at <a href="https://signoz.io/agent-native-observability/">Agent Native Observability in SigNoz</a>.</em></p>]]></content:encoded></item><item><title><![CDATA[Why should a Trace-ID be 128 bits? (A Surprisingly Long Answer)]]></title><description><![CDATA[The answer touches probability theory, distributed systems constraints and fifteen years of industry migration. Let's actually dig in.]]></description><link>https://newsletter.signoz.io/p/why-should-a-trace-id-be-128-bits</link><guid isPermaLink="false">https://newsletter.signoz.io/p/why-should-a-trace-id-be-128-bits</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 03 May 2026 13:48:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!E8fi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Every time I happen to use the trace tab in SigNoz (an observability platform), I&#8217;m met with the same question, and I put it in the &#8220;I&#8217;ll deal with this later&#8221; folder in my brain.</p><p>Until today, when I decided to address the 128-bit <em>elephant</em> in the room.</p><p>So, like a normal human these days, I searched on Google, &#8220;Why is a trace ID 128 bits long?&#8221;</p><p>And the answer was, surprisingly, a long one.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!E8fi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!E8fi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!E8fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png" width="1456" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4490954,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/196311736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!E8fi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!E8fi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffb7f9603-09bf-4104-ac88-327595a66e6c_2600x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>The answer touches probability theory, distributed systems constraints and fifteen years of industry migration. Let&#8217;s actually dig in.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pwPh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pwPh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 424w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 848w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 1272w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pwPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:132897,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/196311736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!pwPh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 424w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 848w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 1272w, https://substackcdn.com/image/fetch/$s_!pwPh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b832826-29f5-4a65-94a5-d7a509ad5c98_1980x882.png 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>What is a trace ID for, anyway?</h2><p>When a request enters your system, say, a user clicks checkout, it might bounce through 20 different services, from your API gateway to auth to cart to inventory, etc. Each service does some work and may call other services. To reconstruct what happened when something goes wrong, you need a way to say &#8220;all these log entries and spans belong to the <em>same original request</em>.&#8221;</p><p>That&#8217;s the job of a trace ID. It&#8217;s generated once, at the entry point, and propagated through every downstream call via HTTP headers like <code>traceparent</code>(or other means of propagation).</p><p>So the trace ID has one job, <em>uniquely</em> identify one request&#8217;s journey through the system.</p><p>Back in our school days, to uniquely identify the students of a class, we had a system in-place which was incremental roll numbers. Why can&#8217;t we adopt something similar here, perhaps a counter system?</p><p></p><h2>Why not just use a counter?</h2><p>In short, counters need coordination. If Service A and Service B both want to generate trace IDs, they&#8217;d need to ask a central server to generate the next number and to remember the previously generated number.</p><p>To prevent this overhead, trace IDs are generated randomly, independently, with no coordination. Every service just picks a random number and trusts that it won&#8217;t collide with anyone else&#8217;s.</p><p>This is the entire reason the size matters: with no coordination, collision avoidance is purely a function of how large your random number is.</p><p>Now the dilemma is deciding how large this random number must be to <em>effectively</em> prevent collisions.</p><p>My first instinct was to think of just making the number big enough so that collisions are impossible. Let&#8217;s take 64 bits for now. 64 bits gives you 2&#8310;&#8308;; roughly 1.8 &#215; 10&#185;&#8313; possible values. That feels astronomical, and surely picking random numbers from a pool that large means collisions should be basically impossible.</p><h2>The Birthday Paradox</h2><p>This is where our intuition fails. Let me explain this with the <em>birthday paradox</em>.</p><p>Imagine a classroom of 23 students. What are the odds that two of them share a birthday? Instead of computing the probability that two people share a birthday, it&#8217;s easier to compute the probability that <em>no two people</em> share a birthday and then subtract it from 1.</p><pre><code><code>            P(no match)=365/365 &#215; 364/365 &#215; 363/365 &#215; &#8943; &#215; 343/365 &#8776; 0.493</code></code></pre><p>So the probability that <em>at least two</em> students share a birthday is:</p><pre><code><code>                                P(match)=1&#8722;0.493 &#8776; 0.507</code></code></pre><p>Just over 50% with 23 people. This is the <strong><a href="https://www.reddit.com/r/memes/comments/pdikoj/birthday_paradox/">birthday paradox</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!84dZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!84dZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 424w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 848w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 1272w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!84dZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png" width="500" height="562" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:562,&quot;width&quot;:500,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:392531,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/196311736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!84dZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 424w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 848w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 1272w, https://substackcdn.com/image/fetch/$s_!84dZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9e8267f7-a816-4433-9004-8a49bf23388b_500x562.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>If we extend this paradox to trace-ids, we realise we are asking whether <em>any two IDs</em> in our entire trace history collide. Let&#8217;s understand this mathematically in the context of trace-ids.</p><h2>The Math Behind The Paradox</h2><div class="callout-block" data-callout="true"><p>This is a math-intensive section! If you want to understand every <em>bit</em> of this better, I suggest you lock in with a pen and paper. &#129299;</p></div><p>Imagine you&#8217;ve generated 4 IDs: A, B, C, and D. A collision occurs when any two of them are equal. So let&#8217;s first list out all the possible pairs,</p><ul><li><p>A &amp; B</p></li><li><p>A &amp; C</p></li><li><p>A &amp; D</p></li><li><p>B &amp; C</p></li><li><p>B &amp; D</p></li><li><p>C &amp; D</p></li></ul><p>That&#8217;s 6 pairs. Each pair is one opportunity for a collision.</p><p>The formula for counting pairs is,</p><pre><code><code>                           number of pairs = k(k&#8722;1)/ 2</code></code></pre><p>where k is the number of IDs. For large k, the difference between k and k-1 barely matters, so we can simplify as,</p><pre><code><code>                            number of pairs &#8776; k^2 / 2</code></code></pre><p>We observe that the key term (k^2) is quadratic, implying that if we generate 10&#215; more IDs, you get 100&#215; more pairs.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to receive more fun engineering content.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><h3>Counting expected collisions</h3><p>Now we need to turn that pair count into something meaningful. Here&#8217;s the cleanest way to think about it:</p><p>Picture a dartboard with N spots on it. You throw k darts blindfolded and each dart lands on a random spot. A collision is when two darts hit the same spot.</p><ul><li><p>The number of dart-pairs is k&#178;/2</p></li><li><p>The chance that any single pair hits the same spot is 1/N</p></li><li><p>So the <em>expected</em> number of collisions is pairs &#215; per-pair-chance = <strong>k&#178;/2N</strong></p></li></ul><p></p><h3>The Formula for Probability</h3><p>Combining all those independent chances with a standard probability trick using the approximation <code>1 - x &#8776; e^(-x)</code> for small x, we land on the famous birthday-paradox formula, where k is the number of IDs, and N is the size (2^number of bits),</p><pre><code><code>                          P(collision) &#8776; 1&#8722;e^(-k&#178;/2N)</code></code></pre><p>The important part of the formula is what&#8217;s <em>inside</em> the exponent: <strong>k&#178;/2N</strong>.</p><p></p><h3>Reading the formula in Plain English</h3><p>The formula tells you a story in three acts, depending on how big k&#178;/2N is. Here&#8217;s the graph for e^x, which makes understanding the acts easier.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!cOQi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!cOQi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 424w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 848w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 1272w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!cOQi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp" width="801" height="400" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:400,&quot;width&quot;:801,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Exponential Graph - GeeksforGeeks&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Exponential Graph - GeeksforGeeks" title="Exponential Graph - GeeksforGeeks" srcset="https://substackcdn.com/image/fetch/$s_!cOQi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 424w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 848w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 1272w, https://substackcdn.com/image/fetch/$s_!cOQi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6119713e-5dc2-4601-a7fa-7b941da7e45c_801x400.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><strong>Act 1: Safe.</strong></p><p>When k&#178;/2N is much less than 1, you&#8217;ve generated far fewer IDs, the exponent is near zero, e^(- near zero) &#8776; 1, and collision probability is essentially zero. You&#8217;re fine.</p><p><strong>Act 2: Danger.</strong></p><p>When k&#178;/2N is around 1, collision probability jumps to about 63%. You&#8217;re now more likely than not to have a collision somewhere in your set.</p><p><strong>Act 3: Inevitable.</strong></p><p>When k&#178;/2N is much greater than 1, the exponent becomes huge, e^(-huge) &#8776; 0, and collision probability rounds to 100%.</p><p></p><p>Now we have everything we need to answer the original question. Let&#8217;s circle back to it.</p><p></p><h2>So why 128 bits?</h2><p>The collision risk depends on the ratio k&#178;/2N, and we can&#8217;t control k (the number of IDs generated).</p><p>What we <em>can</em> control is N (the size of the ID space), and N is determined by how many bits we use.</p><p>So the design question becomes: how big does N need to be so that even after years of operation, k&#178;/2N stays comfortably in Act 1**?**</p><p>Let&#8217;s plug in 64 bits and see what happens. N = 2&#8310;&#8308; &#8776; 1.8 &#215; 10&#185;&#8313;. Now let&#8217;s see how the collision probability evolves as k grows:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8wtZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8wtZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 424w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 848w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 1272w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8wtZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png" width="1400" height="358" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/eb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:358,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:72732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/196311736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8wtZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 424w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 848w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 1272w, https://substackcdn.com/image/fetch/$s_!8wtZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Feb21b03a-32de-4d06-957d-da8ba8367733_1400x358.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>A billion IDs gets us to a 2.7% collision risk, and this will probably happen at any reasonably large company. And by 10 billion IDs, collisions are nearly guaranteed.</p><p>Now let&#8217;s redo the same exercise with 128 bits. N = 2&#185;&#178;&#8312; &#8776; 3.4 &#215; 10&#179;&#8312;:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!D5tA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!D5tA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 424w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 848w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 1272w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!D5tA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png" width="1424" height="356" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:356,&quot;width&quot;:1424,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:74203,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/196311736?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!D5tA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 424w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 848w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 1272w, https://substackcdn.com/image/fetch/$s_!D5tA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1e085a39-4c9e-46fd-9508-5d80ab288a9f_1424x356.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Even at a quadrillion IDs, k&#178;/2N is still 0.0000000015. To reach a meaningful collision risk at 128 bits, you&#8217;d need to generate trace IDs in numbers that exceed every trace ever produced by every observability platform on Earth, combined.</p><p>We can&#8217;t stop k from growing, but we can choose an N so vast that the cliff in Act 3 sits beyond any horizon we&#8217;ll ever care about.</p><p>That&#8217;s why 128 bits.</p><p></p><h2>Why not 256, then?</h2><p>If 128 bits is good, isn&#8217;t 256 bits even better? Mathematically, yes. Practically, no.</p><p>Every trace ID has to be propagated on every HTTP request between services, stored alongside every span, indexed in every backend, and shipped through every log line. At scale, those bytes add up. 128 bits is 16 bytes; 256 bits is 32. Doubling the storage and bandwidth costs of every piece of trace data for a safety margin we already won&#8217;t reach in any realistic universe isn&#8217;t a trade anyone wants to make.</p><p>128 bits is the ideal sweet spot, collision safety effectively forever, and it happens to match the size of a UUID, which means every database, every language, and every protocol already knows how to handle it.</p>]]></content:encoded></item><item><title><![CDATA[Our Project Hail Mary: The Observability Setup Behind an Observability Tool]]></title><description><![CDATA[Today, our internal observability system watches 6 regions, ingesting 21 billion metric points, 14 TB of logs, and 10 TB of traces. The story of how we got here starts with everything falling apart.]]></description><link>https://newsletter.signoz.io/p/our-project-hail-mary-the-observability</link><guid isPermaLink="false">https://newsletter.signoz.io/p/our-project-hail-mary-the-observability</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 19 Apr 2026 13:14:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!LE4I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<div><hr></div><p><em>Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em> I wasn&#8217;t planning on making this a blog of its own and initially wanted to combine it with <strong><a href="https://newsletter.signoz.io/p/how-the-sharks-do-observability">this</a></strong>. Later, when I got on a call with <strong><a href="https://www.linkedin.com/in/vibhu-pandey/">Pandey</a></strong> (the brains behind Nightswatch), I realised this deserved a blog of its own.</em></p><p><em>This is an attempt to do justice to the insane work our platform-pod does daily to keep all our customers&#8217; cloud instances alive and stable. I hope you enjoy reading this as much as I enjoyed learning about this and crafting it for you.</em></p><div><hr></div><p></p><p>Today, our internal observability system watches 6 regions, ingesting 21 billion metric points, 14 TB of logs, and 10 TB of traces per day without breaking a sweat.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LE4I!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LE4I!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 424w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 848w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 1272w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LE4I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png" width="1400" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1400,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:582721,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!LE4I!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 424w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 848w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 1272w, https://substackcdn.com/image/fetch/$s_!LE4I!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F59b69c01-e391-40b1-84bd-b7dbe5603a8c_1400x788.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>We are people who build an observability tool for a living. We spend our days helping customers monitor their systems, debug their incidents, and make sense of their telemetry. So you&#8217;d think we&#8217;d have our own house in order.</p><p>We do now, but the story of how we got here starts with everything falling apart.</p><p></p><h2>Our Early System</h2><p>About 4 years ago, on an eerie Wednesday morning, one of our US cluster&#8217;s ClickHouse (our database) nodes started running hot, some queries were timing out, and a handful of customers were seeing slow dashboards. We followed the standard operating protocol of opening the internal monitoring tool, checking the metrics, and identifying the bottleneck.</p><p>Except the monitoring tool was down too.</p><p>SaaSMonitor, the system we&#8217;d built to watch our customer deployments, had been ingesting so much telemetry that its own collector had crashed.</p><p>This wasn&#8217;t a one-off event and kept happening.</p><p>Our internal monitoring grew organically over the years, starting with a hand-stitched setup called Testbed for our early manually provisioned customers, then SaaSMonitor was bolted on when we launched self-service sign-ups. This meant that if you were debugging something in the US region, you had to check two separate URLs.</p><p>Also, telemetry data went straight from collection to storage without a buffer in between, so any spike in volume would choke the pipeline. Every pod had two containers, an application and a sidecar, but metrics were only exposed at the pod level, implying that when something broke, you couldn&#8217;t tell which container was the culprit. There was no opt-in mechanism, and we collected everything from every pod by default, which meant we were drowning in data we didn&#8217;t need, making the volume problem even worse.</p><p>We tried some quick fixes, like putting an Envoy in front of SaaSMonitor as a load balancer, hoping that spreading traffic across more collector instances would prevent the crashes. It didn&#8217;t work because distributing an overwhelming load across more instances just gives you more instances that are overwhelmed. We needed something fundamentally different.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!e_Jd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!e_Jd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 424w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 848w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 1272w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!e_Jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png" width="931" height="552" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:552,&quot;width&quot;:931,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:51832,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!e_Jd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 424w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 848w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 1272w, https://substackcdn.com/image/fetch/$s_!e_Jd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F16c5bdf6-4427-48c7-9261-3f13e47d768d_931x552.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s when we started building Nightswatch, named after the Night&#8217;s Watch from Game of Thrones &#129399;&#127995;. It was our Project Hail Mary, a single, unified system to observe every cluster, every node, and every container across SigNoz Cloud using SigNoz itself. It took our platform pod over a year, 21 issues, and a complete rethinking of how we observe ourselves.</p><p>This is the story of what we built.</p><div class="callout-block" data-callout="true"><p>This is also a masterclass on how we used almost all seven deployment patterns of the OpenTelemetry Collector. If you aren&#8217;t familiar with it, <strong><a href="https://newsletter.signoz.io/p/patterns-for-deploying-otel-collector">give this a read</a></strong>!</p></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to receive more engaging engineering content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>What are we observing?</h2><p>Before we get into the story of how we built systems for observability, it helps to understand what exactly we&#8217;re trying to <em>watch</em>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DQVM!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DQVM!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 424w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 848w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 1272w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DQVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif" width="660" height="642.2988505747127" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:783,&quot;resizeWidth&quot;:660,&quot;bytes&quot;:188714,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DQVM!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 424w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 848w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 1272w, https://substackcdn.com/image/fetch/$s_!DQVM!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9081480f-4e86-4606-b8be-1944f9fc96eb_783x762.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>SigNoz Cloud is a multi-tenant platform, and we run three regional Kubernetes clusters across the US, EU, and India. When a customer signs up, they get their own isolated namespace within the cluster of their chosen region. Inside that namespace, they get their own SigNoz instance, their own ClickHouse for storing telemetry data, their own OTel collector for ingestion, and their own endpoint (something like <code>acme.us.signoz.cloud</code>).</p><p>Not everything is isolated, though; some infrastructure is shared across all tenants in a cluster. The Nginx controllers that route incoming traffic, the OpenTelemetry gateway that handles initial ingestion, and Redpanda (a Kafka-compatible streaming platform) that buffers data, are pooled resources that every tenant&#8217;s data flows through. But the core components where customer data actually resides and gets queried are fully siloed per tenant.</p><p>Here&#8217;s a slightly abstracted architecture of our customer&#8217;s data plane:</p><p>All of this, including the shared pipeline, the per-tenant components, both flows, across three regions and hundreds of customers, is what Nightswatch needs to observe. Monitoring all of this is what Nightswatch was built for, and, like its Game of Thrones namesake, it has a very interesting structure for guarding the realm.</p><p></p><h2>Overview of Nightswatch</h2><p>Nightswatch is our approach to running SigNoz to monitor SigNoz Cloud. Inspired by the Night&#8217;s Watch from <a href="https://gameofthrones.fandom.com/wiki/Night%27s_Watch">Game of Thrones</a>, the system is split into three roles, each named after a branch of the order.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!5rR0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!5rR0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 424w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 848w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 1272w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!5rR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif" width="1313" height="1013" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1013,&quot;width&quot;:1313,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:700870,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!5rR0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 424w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 848w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 1272w, https://substackcdn.com/image/fetch/$s_!5rR0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe9f55bb6-fec7-4670-9458-7f8f6e987c6b_1313x1013.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p><em>Builders</em> man the Wall, and they run on every node, gathering metrics, logs, and traces from the pods around them. <em>Rangers</em> venture beyond it by actively probing customer endpoints and watching for cluster-level threats like node failures and pod evictions. <em>Stewards</em> keep the supply lines running, and they buffer and forward telemetry from builders and rangers to the <em>Castle</em>, a dedicated SigNoz instance where everything comes together.</p><p>Under the hood, all three are OpenTelemetry Collectors, each configured differently for its role, but the magic lies in how they work together.</p><p></p><h2>Builders: Eyes on Every Node</h2><p><em>I shall live to collect my house gossip and die at my post.</em></p><p>Builders are the most local members of the Watch, and each one cares only about what&#8217;s happening on its own node. They run as OpenTelemetry Collector daemonsets, meaning one builder per node in the cluster. If a node has 15 tenant pods, the builder on that node scrapes all 15. If another node has 3, that builder scrapes 3.</p><p>For metrics and logs, the builder uses a pull model, scraping each container&#8217;s <code>/metrics</code> endpoint (similar to how Prometheus works) and polling container log files on disk at a configurable interval. For traces, it flips to a push model: the application is instrumented with an OpenTelemetry SDK and sends traces directly to the builder&#8217;s OTLP endpoint.</p><p>Not everything gets collected by default, and this is an intentional change we brought in. In the old system, we collected everything from every pod with no way to opt out, which generated more data than we needed and contributed to the pipeline crashes. Nightswatch fixes this with a two-tier approach: node-level metrics like CPU, memory, disk, and network are always collected, because you never want a node on fire with no basic metrics for debugging, but container-specific metrics and logs are opt-in, controlled by annotations on the pod.</p><p>The annotation system is the heart of how builders work. In SigNoz Cloud, each tenant pod has multiple containers, such as the application and a sidecar OTel collector, each exposing metrics on different ports. Pod-level scraping can&#8217;t distinguish between them, which was another blind spot in the old setup.</p><p>Nightswatch annotations solve this by targeting a specific container within the pod, using <code>&lt;container&gt;</code> in the annotation name which gets replaced with the actual container name.</p><p>For metrics:</p><pre><code><code>&lt;OUR_DOMAIN&gt;/&lt;container&gt;.mdscrape: true
&lt;OUR_DOMAIN&gt;/&lt;container&gt;.mdport: 8888
&lt;OUR_DOMAIN&gt;/&lt;container&gt;.mdpath: /metrics
n&lt;OUR_DOMAIN&gt;/&lt;container&gt;.mdinterval: 10s
</code></code></pre><p>For logs:</p><pre><code><code>&lt;OUR_DOMAIN&gt;/&lt;container&gt;.ldscrape: true
&lt;OUR_DOMAIN&gt;/&lt;container&gt;.ldpipeline: json/nginx/...
&lt;OUR_DOMAIN&gt;/&lt;container&gt;.ldinterval: 200ms
</code></code></pre><p>The prefixes tell you the signal type:</p><ul><li><p><code>md</code> &#8594; metrics discovery</p></li><li><p><code>ld</code> &#8594; logs discovery</p></li></ul><p>And each annotation controls a specific behaviour: whether to scrape, which port and path to hit, how often to scrape, and, for logs, which parsing pipeline to use.</p><div class="callout-block" data-callout="true"><p>Fun fact: This container-level scraping approach was something we built about 6-8 months before the OpenTelemetry community even started considering it.</p><p></p></div><p>Builders give you deep visibility into what&#8217;s happening <em>inside</em> every node, but they can&#8217;t tell you whether the customer&#8217;s endpoint is actually reachable from the outside, or whether the cluster itself is healthy. Here&#8217;s where rangers step in.</p><p></p><h2>Rangers: Eyes on the Cluster</h2><p><em>I shall live to probe and check the stability of my kingdom and die at my post.</em></p><p>Unlike builders, which run as daemonsets (one per node), rangers are deployments; we just need a replica running somewhere on the cluster. In our case, they run on a dedicated node pool reserved for Nightswatch workloads, so they never compete for resources with customer pods.</p><p>We run two types of rangers, each watching a different surface:</p><p><strong>The ingress ranger</strong> acts as a synthetic customer. It periodically hits each customer&#8217;s endpoint, something like <code>acme.us.signoz.cloud/healthz</code> and checks whether it gets a healthy response. If it doesn&#8217;t, that could mean the customer&#8217;s SigNoz pod has crashed, or that the Nginx controller is misconfigured. Either way, the failure flows to the Castle as a data point and can trigger an alert immediately.</p><p>Like builders, the ingress ranger uses annotations to know what to probe:</p><pre><code><code>&lt;OUR_DOMAIN&gt;/prdscrape: "true"
&lt;OUR_DOMAIN&gt;/prdpath: "/healthz"
&lt;OUR_DOMAIN&gt;/prdinterval: "10s"
</code></code></pre><p>The <code>prd</code> prefix stands for probe discovery, <code>prdscrape</code> enables probing, <code>prdpath</code> specifies which path to hit, and <code>prdinterval</code> controls how frequently it runs.</p><p><strong>The k8s ranger</strong> watches the cluster&#8217;s infrastructure health by talking to the Kubernetes API server. It collects cluster-level metrics like node count, pod states, resource usage, and available capacity, giving us the big-picture view of whether a region is running healthy or approaching its limits.</p><p>It also captures Kubernetes events: whenever something notable happens, like a pod getting OOMKilled or a container crash-looping, the API server emits an event. The catch is that these events are ephemeral, meaning Kubernetes discards them after about an hour. The k8s ranger grabs them and ships them to the Castle as permanent records, which turns out to be invaluable for post-incident debugging.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wu-J!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wu-J!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 424w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 848w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 1272w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wu-J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif" width="351" height="390" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:390,&quot;width&quot;:351,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:59593,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wu-J!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 424w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 848w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 1272w, https://substackcdn.com/image/fetch/$s_!wu-J!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F347ebf51-21ab-4a8e-a765-2c1cbdea239f_351x390.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Builders and rangers together give us full visibility of what&#8217;s happening inside each node, whether endpoints are reachable, and whether the cluster is healthy. But all that telemetry still needs to get somewhere safely. If the pipeline between collection and storage chokes, which is what kept killing the old system, none of this visibility matters.</p><p>This is why stewards are central pieces of Nightwatch.</p><p></p><h3>Stewards: The Supply Line</h3><p><em>I shall live to serve the builders and the rangers and die at my post.</em></p><p>Remember the core problem with the old system? Telemetry went straight from collection to storage with nothing in between, and any spike in volume would choke the pipeline. We actually tried fixing this once before, in V1, we put Envoy in front of SaaSMonitor as a load balancer, hoping that distributing traffic across more collector instances would stop the crashes. It didn&#8217;t work since distributing an overwhelming load across more instances just gives you more instances that are overwhelmed. A load balancer alone wasn&#8217;t the solution; we needed to engineer a complete buffer pipeline.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ijYB!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ijYB!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 424w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 848w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 1272w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ijYB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif" width="601" height="361" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:361,&quot;width&quot;:601,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:34545,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/gif&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ijYB!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 424w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 848w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 1272w, https://substackcdn.com/image/fetch/$s_!ijYB!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb3d4931a-b41b-40e4-a9b3-390677b68d5b_601x361.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>That&#8217;s what the stewards are: three OpenTelemetry Collectors working together to move data reliably from collection to storage.</p><p><strong>Envoy</strong> sits at the front as the entry point. When builders and rangers from the US, EU, and IN clusters send telemetry, Envoy is the first thing that receives it, distributing incoming connections across however many OTel Gateway instances are running behind it. It&#8217;s gRPC-aware; the protocol OTel uses to ship telemetry, so it can distribute connections more intelligently than a basic TCP load balancer. The same component that failed in V1 works perfectly in V2, because this time it has the rest of the pipeline behind it.</p><p><strong>OTel Gateway</strong> receives telemetry from Envoy, batches it, and forwards it downstream. It runs in gateway mode, meaning it doesn&#8217;t collect telemetry from local sources; it just receives, processes, and sends. The gateway is exposed via an internal load balancer without authentication, so builders and rangers in any region can send telemetry to the Nightswatch cluster without special credentials.</p><p><strong>Redpanda</strong> is where it all comes together. It&#8217;s a Kafka-compatible streaming platform that acts as a durable buffer between the gateway and the Castle. The gateway writes data into Redpanda, and the Castle&#8217;s SigNoz instance consumes from it at whatever pace it can handle. If the Castle slows down or goes offline temporarily, data doesn&#8217;t get lost; instead, it queues up, and the Castle catches up later.</p><p></p><h3>Castle</h3><p>All telemetry ultimately flows into the Castle, which is a dedicated SigNoz instance. This lives in a separate management cluster (the control plane), completely isolated from the data plane clusters where customer workloads run.</p><p>This separation was deliberately implemented to prevent the castle from being affected if one of the region&#8217;s clusters crashed. The Castle also monitors itself by running its own builders and rangers.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wHyL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wHyL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 424w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 848w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 1272w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wHyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png" width="1241" height="1191" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1191,&quot;width&quot;:1241,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:357190,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/194606236?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!wHyL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 424w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 848w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 1272w, https://substackcdn.com/image/fetch/$s_!wHyL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F28a28733-a131-4d69-a492-0931bfc0a3db_1241x1191.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Nightswatch was our platform-pod&#8217;s first-ever milestone spanning over 1 year and 3 months, 21 issues to build something we&#8217;re insanely proud of. And the Wall still stands. If building observability at this scale sounds like your kind of challenge, we&#8217;re hiring. <strong><a href="https://jobs.ashbyhq.com/SigNoz">Come join the Watch</a></strong>! &#128521;</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for more mind-blowing engineering tales!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How the Sharks Do Observability ]]></title><description><![CDATA[An account on how Netflix and Uber observe their massive systems everyday.]]></description><link>https://newsletter.signoz.io/p/how-the-sharks-do-observability</link><guid isPermaLink="false">https://newsletter.signoz.io/p/how-the-sharks-do-observability</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Thu, 02 Apr 2026 13:45:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!POQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This blog took 6 days and 7 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p>As an observability enthusiast working at an observability startup and running an observability newsletter, I find this topic wildly fascinating. I know a bunch of lore on how companies thought about and invented (or, more precisely, reinvented) their observability systems to support their growing scale. But two of these stories have stuck with me and are interesting because each broke its observability system at a critical moment of growth and rebuilt it in a completely different and particularly breathtaking way, from which we have a lot to learn!</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!POQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!POQA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!POQA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" width="1456" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/804e9de6-b693-4987-b806-255056ffd377_2600x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3239153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/192927508?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!POQA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!POQA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>1. Netflix</h2><p>Netflix&#8217;s observability origin story starts in a place that will make most engineers wince. In May 2011, Netflix was using a home-grown solution called Epic to manage time-series data. Epic was a combination of Perl CGI scripts, RRDTool logging, and MySQL. Alongside Epic, their telemetry was split between this home-grown tool and an IT-provisioned commercial product. Epic&#8217;s flexibility letting engineers send in arbitrary time-series data and query it made it popular, and it became the primary system of record.</p><p>They were tracking around 2 million distinct time series, and the monitoring system was regularly failing to keep up with the volume of data, and several things were about to make it dramatically worse: Netflix was shifting from rolling pushes to red/black deployments, starting to actually leverage auto-scaling rather than just using fixed-size groups, and expanding internationally into Latin America and Europe.</p><p>All these changes required them to scale by at least an order of magnitude from 2 million to 20 million metrics or more. Perl CGI scripts and MySQL were never going to handle what Netflix was becoming, and it was simply beyond what Epic was capable of.</p><p>So in early 2012, they started building&nbsp;<strong><a href="https://netflix.github.io/atlas-docs/">Atlas</a></strong><a href="https://netflix.github.io/atlas-docs/">,</a>&nbsp;and by late 2012, it was being phased into production, with full deployment completed in early 2013.</p><p>The design philosophy behind Atlas is a chapter filled with learnings. Atlas features in-memory data storage, allowing it to gather and report very large numbers of metrics very quickly. It captures operational intelligence whereas business intelligence analyses trends over time, operational intelligence provides a picture of what is currently happening within a system.</p><p>Since their focus was primarily on operational insight, the top priority was determining what&#8217;s going on right now. This led to the following rules of thumb:</p><p>1/ data becomes exponentially less important as it gets older</p><p>2/ restoring service is more important than preventing data loss</p><p>This is a fundamentally different philosophy from <em>store everything forever</em>. Netflix decided that recent data matters enormously and old data barely matters at all.</p><p>The internal Atlas deployment breaks data into multiple time windows. The last 6 hours of data is kept fully in memory, so they can show recent data as long as clients can successfully publish. Everything is sharded across machines in these in-memory clusters. For older data, they compute rollups via Hadoop processing, drastically reducing data volume for historical queries.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you want more such observability lore, stay tuned!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>One of the things they really wanted to fix from Epic was how dimensions worked. In the old system, everything was mangled into a metric name with different conventions per team, and users had to resort to complex regular expressions to slice and dice data. In Atlas, a metric&#8217;s identity is an arbitrary, unique set of key-value pairs. Some keys are set automatically by the client library (server name, AWS zone, ASG, cluster, application, region), with significant flexibility for users to specify whatever keys make sense for their use case.</p><p>The growth numbers tell the story of why all this mattered. In 2011, they were monitoring 2 million metrics. By 2014, they were at 1.2 billion metrics, and the numbers continued to rise. They routinely see Atlas fetch and graph many billions of datapoints per second. Today, Atlas processes 17 billion metrics and 700 billion distributed traces per day on 1.5 petabytes of log data, and the system&#8217;s architecture has kept observability data processing to less than 5% of Netflix&#8217;s infrastructure costs!</p><p>But even Atlas hit its limits. A few years ago, Netflix&#8217;s SRE team was paged because their alerting system was falling behind, and the critical application health alerts were reaching engineers 45 minutes late. One platform team had programmatically created tens of thousands of new alerts, which overwhelmed Atlas&#8217;s query capacity. They were looking at an order-of-magnitude increase in alert queries over the next 6 months, and scaling up Atlas&#8217;s storage layer to serve that volume would have been prohibitively expensive, since Atlas was already one of Netflix&#8217;s largest services in both size and cost.</p><p>Their answer was Atlas Streaming Eval, moving alerting from a cron-based query model to a streaming model. Today, they run 20x as many alert queries as a few years ago, at a fraction of the cost. Multiple platform teams at Netflix programmatically generate and maintain alerts on behalf of their users without affecting others, and streaming evaluation enabled them to relax cardinality restrictions and to alert on queries that were previously rejected.</p><p>What&#8217;s special here is that instead of throwing more hardware at the problem, they changed the model entirely, and in my opinion, that&#8217;s what separates great observability teams from the rest.</p><p>Some interesting references!</p><ul><li><p><a href="https://netflixtechblog.com/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a">Introducing Atlas: Netflix&#8217;s Primary Telemetry Platform</a> (Netflix Tech Blog)</p></li><li><p><a href="https://netflixtechblog.com/improved-alerting-with-atlas-streaming-eval-e691c60dc61e">Improved Alerting with Atlas Streaming Eval</a> (Netflix Tech Blog)</p></li><li><p><a href="https://netflixtechblog.com/lessons-from-building-observability-tools-at-netflix-7cfafed6ab17">Lessons from Building Observability Tools at Netflix</a> (Netflix Tech Blog)</p></li><li><p><a href="https://www.infoq.com/presentations/netflix-edgar-observability/">Solving Mysteries Faster with Observability</a> (InfoQ / QCon)</p></li><li><p><a href="https://netflix.github.io/atlas-docs/">Atlas Documentation</a> (Netflix OSS)</p></li></ul><p></p><h2>2. Uber</h2><p>Uber&#8217;s observability story starts in 2014 with a Graphite, Carbon, and WhisperDB stack that was held together very loosely. By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack, which stored them in the Whisper file format in a sharded Carbon cluster. They used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts.</p><p>The problems were fundamental, like the stack not being horizontally scalable, meaning you couldn&#8217;t add capacity just by adding machines. There were no replicas, so a single node dying meant losing an eighth of all Uber&#8217;s data and adding capacity required taking the system offline for a week or more. <strong><a href="https://www.linkedin.com/in/martinmao/">Martin Mao</a></strong>&#8217;s first on-call week was spent deleting data from the backend just to keep the observability stack alive.</p><p>First, they did a quick fix by swapping in Cassandra for time-series storage and ElasticSearch for the metrics index, all stitched together with Go. They stood up this new system in time for Halloween 2015, which was Uber&#8217;s second-largest peak load event. That year was the first time Uber&#8217;s observability system didn&#8217;t have an outage during the Halloween peak.</p><p>But Cassandra was the wrong tool for the job because they were using it as a time-series database even though it was built as a key-value store. As they entered their hyper-growth phase, the firefighting that had plagued the Graphite years resurfaced in a new form.</p><p>The team decided to build <strong><a href="https://www.uber.com/en-IN/blog/m3/">M3DB</a></strong>, a custom time-series database with an embedded inverted index from scratch. The architecture they landed on is worth understanding in detail.</p><p>Applications on hosts emit metrics to a local daemon called &#8220;<em>Collector</em>&#8220;, which aggregates them at 1-second intervals and then forwards them to the aggregation tier using a shard-aware topology retrieved from etcd. The aggregation tier further aggregates into 10-second and one-minute tiles, and the M3DB ingestor writes them to the storage tier. M3 Coordinator acts as a Prometheus sidecar, providing a global query and storage interface on top of M3DB clusters. It handles downsampling and ad hoc retention using rollup rules stored in etcd, which runs embedded in the binary of an M3DB seed node.</p><p>Let&#8217;s look at the results (quite phenomenal). Any given second, M3 processes 500 million metrics and persists another 20 million aggregated metrics. Extrapolating to a 24-hour cycle means roughly 45 trillion metrics per day, and the platform also houses over 6.6 billion time series!</p><p>The really interesting engineering is in the high-dimensional problem. High-dimensionality metrics; data tracked over time with many different aspects like route, region, and status code are critical to the business but costly at Uber&#8217;s scale. A single emission could lead to 100 million unique time series, and because code changes roll out to specific groups of cities over a few hours, they need city-level monitoring granularity. Different cities have different configurations; for example, rider pickups might be blocked on a street due to a parade, or local events can cause traffic changes.</p><p>Their alerting ecosystem is equally bespoke; it includes two in-datacenter alerting systems: uMonitor for time-series metrics-based alerting against M3, and Neris for host-level checks. Both feed into a common notification and deduplication pipeline called Origami. uMonitor uses static thresholds for steady-state metrics and anomaly thresholds via Argos, Uber&#8217;s anomaly detection platform, which generates dynamic thresholds from historical data.</p><p>They also added <strong><a href="https://www.jaegertracing.io/">Jaeger</a></strong>, their open-source distributed tracing system. Jaeger&#8217;s distributed tracing follows requests from one service to another, composing a narrative of what happened and what went wrong, making it much easier to pinpoint causation.</p><p>The operational improvement after M3 was dramatic. Setting up monitoring in new data centres became 4x faster, and the operational maintenance burden dropped by over 16x, while combined high/low-urgency notifications per week went from 25 with Cassandra to 1.5 with M3DB. &#128079;&#127995;</p><p>Over a million unique visitors hit their systems every day, and more than half of their engineering team are using these observability tools daily.</p><p></p><p>Some resources that were my references and really good reads!</p><ul><li><p><a href="https://www.uber.com/blog/m3/">M3: Uber&#8217;s Open Source, Large-Scale Metrics Platform for Prometheus</a> (Uber Engineering Blog)</p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/how-uber-built-its-observability-platform">How Uber Built its Observability Platform</a> (The Pragmatic Engineer)</p></li><li><p><a href="https://www.uber.com/blog/observability-at-scale/">Observability at Scale: Building Uber&#8217;s Alerting Ecosystem</a> (Uber Engineering Blog)</p></li><li><p><a href="https://www.uber.com/en-KW/blog/optimizing-m3/">Optimizing M3: How Uber Halved Metrics Ingestion Latency by Forking the Go Compiler</a> (Uber Engineering Blog)</p></li><li><p><a href="https://www.uber.com/blog/optimizing-observability/">Optimizing Observability with Jaeger, M3, and XYS</a> (Uber Engineering Blog)</p><p></p><p></p></li></ul><p>But here&#8217;s an interesting dilemma. What happens when the product you&#8217;re monitoring <em>is</em> the monitoring tool itself? When the observability system that&#8217;s supposed to tell you everything is broken... is the same system you need to diagnose the problem?</p><p>At Signoz, we have solved this exact problem by building a system called <strong><a href="https://gameofthrones.fandom.com/wiki/Night%27s_Watch">Nightswatch</a></strong>, a Game of Thrones-themed architecture featuring builders, rangers, and stewards to run SigNoz to monitor SigNoz.</p><p>That story drops in the next edition. Stay tuned.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote>]]></content:encoded></item><item><title><![CDATA[AI Isn't Replacing SREs. It's Deskilling Them.]]></title><description><![CDATA[When AI handles 95% of your incident response, do you get worse at handling the 5% that actually matters?]]></description><link>https://newsletter.signoz.io/p/ai-isnt-replacing-sres-its-deskilling</link><guid isPermaLink="false">https://newsletter.signoz.io/p/ai-isnt-replacing-sres-its-deskilling</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 28 Feb 2026 13:45:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Hgvq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! <br>&amp;<br>This piece took 6 days, 5 hours to be cooked, hope we served. </em>&#127770;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p></blockquote><p></p><p></p><p>There are two popular prophecies floating around tech circles these days.</p><p>The first says <strong><a href="https://swizec.com/blog/the-future-of-software-engineering-is-sre/">SRE is the future of all software engineering</a></strong>, that as AI writes more and more code, the humans who remain will be the ones keeping systems alive. The second says AI will devour every tech job alive, SREs included. Neither is particularly useful if you&#8217;re an SRE trying to figure out what your Tuesday will look like in 2027.</p><p>Let&#8217;s ask a more grounded question by looking at what&#8217;s already happening: When AI handles 95% of your incident response, do you get worse at handling the 5% that actually matters?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hgvq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 424w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 848w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1272w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" width="708" height="473.13461538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:708,&quot;bytes&quot;:947996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/189391546?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 424w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 848w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1272w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Most of us already use AI for our daily work (our brains a little fried!), and so do SREs. Today&#8217;s discussion agenda is not whether AI replaces SREs, but whether AI is quietly making SREs less capable and whether anyone will notice anything before the next novel outage hits. The foundational framework for this entire debate comes from a 1983 research paper that&#8217;s eerily prescient.</p><p></p><h3>The Ironies of Automation: Part-I</h3><div><hr></div><p>Note from author: Below is a brief precursor to History of Automation, which you might enjoy if you are into History and Cultures (like me &#128521;).</p><div><hr></div><p>In 1983, a cognitive psychologist named Lisanne Bainbridge published a paper called <em><strong><a href="https://www.semanticscholar.org/paper/Ironies-of-automation-Bainbridge/0713bb9d9b138e4e0a15406006de9b0cddf68e28">Ironies of Automation</a></strong></em>. It became one of the most cited papers in human factors research, and its core argument is almost uncomfortably relevant today.</p><p>Bainbridge studied what happened when factories and industrial systems automated the work of blue-collar operators. The findings were paradoxical and revealed that the more you automate a process, the more critical the human operator becomes during the rare moments automation fails and the less practice they get, and the worse they become at exactly those interventions. Automation, which was inherently designed to remove humans from the loop, left them with the worst possible job, i.e., long stretches of passive monitoring punctuated by rare, high-stakes crises they were increasingly unprepared for.</p><p>Ring any bells yet? &#128578;</p><p>Basically, I&#8217;m drawing a parallel between the AI revolution and industrial automation. Industrial automation reshaped blue-collar work by taking over routine physical tasks, and the workers who remained had to handle exceptions they&#8217;d lost the muscle memory for. AI is doing the same thing to knowledge workers by taking over the routine cognitive tasks, the pattern matching, the triage, the known-issue resolution and leaving humans with the rare, complex, ambiguous problems.</p><p>The <em>exact</em> problems that require deep <em>expertise</em>, the <em>exact expertise</em> that atrophies when you stop practising.</p><p>Now we&#8217;re replaying this pattern with AI agents, and the stakes in software systems are only growing.</p><p></p><h2>Current State of AI in SRE</h2><p>Let&#8217;s take stock of where things stand today in the world of site reliability engineering.</p><h3><strong>What&#8217;s already automated or heavily AI-assisted?</strong></h3><p>Alert noise reduction and intelligent grouping, runbook execution for known issues, log pattern detection and anomaly flagging, and basic root-cause suggestions from historical incident data, and auto-remediation for well-understood failure modes like restarting a crashed pod or scaling up a service that&#8217;s running hot, are all fairly automated today.</p><p></p><h3><strong>What&#8217;s on the horizon?</strong></h3><p>Some immediate targets include multi-signal correlation across metrics, logs, and traces, autonomous root-cause analysis for partially understood failures, predictive incident detection before users are affected, AI-driven change risk assessment and automated rollbacks.</p><p>PagerDuty frames this as a tiered model.</p><ul><li><p>Tier 1 incidents: Known issues with known fixes get fully automated.</p></li><li><p>Tier 2 incidents: Partially understood problems receive AI recommendations with human validation.</p></li><li><p>Tier 3 incidents: Novel, complex, cascading failures stay human-led, with AI providing supporting context.</p></li></ul><p>But here&#8217;s the catch.</p><p>If <em>human</em> SREs (okay, now we have to use adjectives like human &#129401;) only engage with Tier 3 incidents, i.e. the novel, never-before-seen outages, where do they build the <em>intuition</em> to handle them? Intuition is usually developed from years of hands-on incident response, pattern recognition built through repetition, and the kind of gut-level understanding of a system that only develops from painfully waking up at the odd hour to solve the bug that brought the system down.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe if you wish to read some more hot takes. We are cooking some great ones!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Crisis of Deskilling</h2><p>This is where the picture starts getting blurry. Let&#8217;s look at some emerging research on AI-induced deskilling across multiple fields, which paints a consistent and concerning picture.</p><p>In medicine, a <a href="https://www.thelancet.com/journals/langas/article/PIIS2468-1253(25)00133-5/abstract">recent study</a> found that endoscopists who used AI assistance for polyp detection saw their unassisted detection rates drop from 28% to 22% after a period of AI use. They got worse at the thing they were supposed to be experts in, not because they forgot the theory, but because they stopped exercising the skill.</p><p>In aviation, <a href="https://flightsafety.org/wp-content/uploads/2019/12/IASS_2019_Behrend_Lafargue.pdf">research</a> has shown that long-haul pilots who rely heavily on autopilot systems experience measurable degradation in situational awareness and manual flying ability. The problem got serious enough that the FAA now mandates more manual flying time to counteract the effect.</p><p>Somewhere over the past year, AI stopped being a tool I occasionally reached for and became the first thing I reach for, <em>always</em>. My instinct now is to offload as much as possible and apply my own thinking only where it&#8217;s absolutely unavoidable. The problem is that these moments are becoming the only exercises my brain gets, and I can feel the <em>rust</em>.</p><p>We can draw a pattern here. The more you let the system handle, the worse you get at handling things yourself and here&#8217;s the truly dangerous part, <em>you don&#8217;t feel it happening</em>. It gets masked as hyper-productivity. Cognitive research suggests that because AI tools make tasks feel easier and enhance visible performance, users are often unable to accurately judge the true status of their own skills. You feel competent, dashboards look green, and then on a Wednesday, a novel incident hits that doesn&#8217;t match any pattern the AI has seen, and you realise the muscle has atrophied.</p><p>For SREs, this manifests in specific ways, like we stop reading raw log streams because the AI summarises them, we stop forming hypotheses during incidents because the AI suggests root causes, we stop building mental models of system architecture because the AI maps dependencies for us, and each of these individually looks like a productivity win. Collectively, they hollow out the very expertise that makes an SRE effective when things go sideways in ways nobody anticipated.</p><p>But there&#8217;s something even more concerning than deskilling, and researchers have started calling it <em>never-skilling</em>. Deskilling means you once had a capability but have since lost it. Never-skilling means you never developed it in the first place. For junior SREs entering the field today in an environment where AI handles most of the incident response workflow, the opportunities to build foundational intuition and muscle through hands-on practice are vanishing.</p><p>The training pipeline itself is broken and not <em>self-healing</em>.</p><p>SREs realise their skills are degrading and lean more on AI to compensate, which further degrades their skills, creating a vicious cycle from which escape is difficult.</p><h3>What Can We Do About It?</h3><p>We are definitely not rejecting AI tooling; we are adopting it and integrating it stronger than ever before, because that&#8217;s the only way forward.</p><p>A few approaches worth considering:</p><p><strong>Deliberate inefficiency.</strong> Just as the FAA mandates manual flying time even when the autopilot is perfectly capable, SRE teams can designate certain incidents, even the ones the AI could handle, as <em>human-practice opportunities</em>. This can be considered as a long-term investment to keep skills fresh, although it might come at the cost of a super-fast solution</p><p><strong>Build for human-in-the-loop, not human-on-the-side.</strong> There&#8217;s a meaningful difference between a system where a human approves an AI&#8217;s recommendation and one where a human actively engages with the problem alongside AI. The former keeps humans in a supervisory role that Bainbridge (the lady who wrote <em>that</em> research paper about 40 years ago) showed leads to vigilance decay, and the latter keeps them cognitively engaged.</p><p>Let&#8217;s zoom out and take a look at the bigger picture.</p><h3>The Bigger Picture</h3><p>Everything we&#8217;ve discussed here, the ironies of automation, the deskilling risk, the never-skilling problem, collectively applies well beyond SRE. Software engineering as a whole is navigating the same tension. As AI writes more code, reviews more PRs, and handles more debugging, the same questions apply.</p><p>We&#8217;re talking about SREs specifically because that&#8217;s the world we live in at <strong><a href="https://signoz.io/">SigNoz</a></strong>. We build an open-source observability platform, the kind of tool that gives SREs the metrics, traces, and logs they need to understand their systems deeply. For us, this deskilling question is not a rhetorical fad; it directly shapes how we&#8217;re building AI into our product.</p><p>Our approach is to start with an AI assistant that helps SREs leverage the power of LLMs while keeping humans firmly in control. Eventually, we&#8217;ll enable more autonomy but within clear guardrails, and only as trust is earned.</p><p>One advantage we have in this space is that, as an observability platform, we sit on the data itself, the metrics, traces, and logs that SREs rely on. Most AI SRE products today integrate with observability tools through APIs, which means they&#8217;re working with a limited, second-hand view of your systems. Because we own the data layer, we can build much deeper, more context-aware AI capabilities that understand your system the way an experienced SRE would.</p><p>And to answer the burning question in your head, our goal isn&#8217;t AI that replaces SREs. It&#8217;s AI that supercharges SREs. Unlike ongoing lore, we believe humans will remain essential for the decisions that matter most, especially those that impact production infrastructure.</p><p>The future of SRE is human <em>with</em> AI intentionally designed to keep humans sharp, engaged, and ready for the 5% that really counts.</p><p></p><div><hr></div><p>Here&#8217;s the<strong> <a href="https://www.linkedin.com/posts/pranay01_something-ive-been-thinking-about-lately-activity-7428804029134225408-IXif?utm_source=social_share_send&amp;utm_medium=member_desktop_web&amp;rcm=ACoAAC3MkwYBDZBMgATtR9hOGisjheK_u1VDu6w">LinkedIn post</a></strong> our founder posted a few days ago, which inspired me to write this.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe if you wish to read more hot takes. We are cooking some great ones!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Saving Money with Sampling Strategies Beyond Head and Tail-based Sampling]]></title><description><![CDATA[I decided to go down the rabbit hole to find the strategies that don&#8217;t get the spotlight and make this edition about the lesser-known types of sampling.]]></description><link>https://newsletter.signoz.io/p/saving-money-with-sampling-strategies</link><guid isPermaLink="false">https://newsletter.signoz.io/p/saving-money-with-sampling-strategies</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Tue, 17 Feb 2026 14:03:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2VSp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p></blockquote><p></p><p>When I first encountered sampling about a year ago, I knew only about head- and tail-based sampling. Mainly because most mainstream documentation covered primarily about them.</p><p>But recently, I realised I&#8217;d only been looking at the tip of the iceberg.</p><p>I stumbled upon<strong> <a href="https://www.gouthamve.dev/sampling-at-scale-with-opentelemetry">an article</a> </strong>that discussed sampling in greater depth. I decided to go down the rabbit hole to find the strategies that don&#8217;t get the spotlight and make this edition about the lesser-known types of sampling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2VSp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2VSp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 424w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 848w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1272w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" width="725" height="484.50080515297907" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:621,&quot;resizeWidth&quot;:725,&quot;bytes&quot;:72938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/188254385?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2VSp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 424w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 848w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1272w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s look at them in greater detail.</p><p></p><h2>#1. Remote Sampling</h2><p>To put it simply, it&#8217;s head-based sampling, but centrally controlled. Each service fetches sampling rules from a central config server. You can specify default and per-endpoint rates in a JSON file, and applications poll for updates periodically. If you are still wondering what the bigger deal is, it is that we can increase or decrease the sampling rate during incidents by changing this file, and within a minute, the applications pick up the new sampling rates. </p><p>That is quite powerful. Despite being battle-tested (used in Uber!), there&#8217;s surprisingly little documentation in OpenTelemetry. Users often struggle to enable Jaeger-style remote sampling with OTel. Some resort to running a Jaeger agent solely to serve the sampling config. OpenTelemetry supports it, but there is very little documentation. Remote sampling lets you keep a low baseline sample rate (say, 1-5%) most of the time and only ramp up to 50-100% when needed, such as during an incident or a debugging session. Because you don&#8217;t need a redeploy, teams are more likely to actually adjust rates to control costs or get details when it matters.</p><p></p><h2>#2. Consistent Reservoir Sampling</h2><p>It&#8217;s essentially head-based sampling that guarantees a fixed sample size. Instead of a simple random percentage, a reservoir sampler maintains a rolling buffer of traces, retaining exactly N traces per time window by using a discrete set of sampling rates and consistency algorithms to ensure fair selection.</p><p>Probabilistic sampling yields a variable number of samples, i.e if traffic doubles, so do your sampled traces and costs. Reservoir sampling always uses a fixed sample size. It&#8217;s statistically representative because the algorithm rotates items in the reservoir with uniform probability.</p><p>This strategy essentially puts a hard ceiling on trace ingestion. It&#8217;s ideal for ensuring you don&#8217;t exceed your budget, even during traffic spikes. The trade-off is that during very low-traffic periods, you might underutilise capacity, but <em>usually</em> most teams prefer predictable costs to a few extra traces.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe to read about more interesting ways you can reduce your observability bill!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>#3. Metrics-from-Traces</h2><p>You can sample traces aggressively, for example<em>, only</em> keep 5%<em>,</em> but still extract metrics from 100% of them before they&#8217;re dropped. In practice, this means placing a metrics-generation stage in your telemetry pipeline bef<strong>ore</strong> the sampling stage. OpenTelemetry makes this possible with components such as <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/README.md">Span Metrics</a> and <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/servicegraphconnector/README.md">Service Graph</a>.</p><p>If we naively sample traces, we also lose information needed for metrics such as request rates, error counts, and latencies. One solution is to tally up metrics <em>before</em> any sampling decisions.</p><p>In an OTel Collector, we might chain a spanmetrics connector in the pipeline, then a Sampling processor after it. SpanMetrics will emit metrics (RED metrics such as request rate, error count, latency distributions, service call graphs, etc.) for every span that passes through, so you get complete coverage. Then the sampler (head or tail) drops, say, 95% of spans before storage. The result is that our monitoring dashboards and alerts, which rely on metrics, remain 100% correct, while your trace storage volume is only 5% of raw traffic.</p><p></p><h2>#4. Byte-Rate Limiting (Throttle by Data Volume)</h2><p>This refers to sampling based on the size of traces, not just the count. This is an often-overlooked but effective strategy, you set a cap, such as <em>ingesting at most 10 MB of trace data per second.</em> The sampler then makes decisions to stay under that throughput. OpenTelemetry recently added a <code>bytes_limiting</code> policy in the tail-sampling processor for this. You can read more about it <strong><a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md">here</a></strong>.</p><p>It uses a token bucket algorithm, which is common for rate limiting, but the tokens represent bytes. The collector actually measures the size of each trace in bytes, using the protobuf serialised size to accurately account for how much data each trace would consume. You configure a sustained bytes-per-second rate and a burst capacity. For example:</p><pre><code><code>policies:
  - name: volume-limit
    type: bytes_limiting
    bytes_limiting:
      bytes_per_second: 10485760  # 10 MB per second
      burst_capacity: 20971520   # allow bursts up to 20 MB

</code></code></pre><p>If a few gigantic traces arrive, the processor will quickly use up the token budget and start dropping subsequent traces until the rate falls back under 10 MB/s. Conversely, if traces are small, more can pass through until the aggregate size hits the limit.</p><p>This becomes extremely useful when trace sizes vary a lot. For instance, one request might normally produce a 50 KB trace, but a worst-case code path might generate a 5 MB trace. A standard sampler working per-trace might keep both equally, but the latter one trace costs as much as 100 smaller ones.</p><p></p><h2>#5. Adaptive Sampling</h2><p>Adaptive sampling adjusts trace sampling rates in real-time based on live traffic patterns or performance signals. The goal here is to keep overall data volume within budget while dynamically increasing sampling during anomalous events. For instance, you might normally sample only a small percentage of requests, but automatically raise the sample rate when latency or error rates spike beyond an SLO threshold. One strategy is throughput-based adaptation; setting an upper limit on traces per second and letting the system tune the probability to meet that cap. Another is key-based dynamic sampling, where the collector samples frequent events less and rare events more.</p><p>Here&#8217;s an interesting <a href="https://github.com/open-telemetry/opentelemetry-specification/issues/691">GitHub thread</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Asp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Asp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 424w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 848w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1272w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114885,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/188254385?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Asp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 424w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 848w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1272w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Adaptive schemes keep observability costs predictable by avoiding oversampling during high-traffic periods, yet they can temporarily boost fidelity when something goes wrong.</p><blockquote><p><em>Care must be taken to ensure coordination across distributed services so that increasing sampling doesn&#8217;t overload the system or skew the data.</em></p></blockquote><p>In my opinion, the shift from conventional probabilistic sampling to the methods above reflects a change in how we view observability. Ultimately, the <em>right</em> sampling strategy aligns your visibility needs with your infrastructure budget, and as OpenTelemetry matures, it will likely become the new standard for any team operating at scale.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to learn more about different interesting ways to save your observability costs!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Reduce Telemetry Volume by 40% Smartly (Java)]]></title><description><![CDATA[But with great power comes great responsibility.]]></description><link>https://newsletter.signoz.io/p/is-your-opentelemetry-auto-instrumented</link><guid isPermaLink="false">https://newsletter.signoz.io/p/is-your-opentelemetry-auto-instrumented</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Wed, 04 Feb 2026 14:02:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oEXb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This post took 5 days, 11 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p></p><p></p><p>OpenTelemetry has become the <em>de facto</em> choice for many organisations&#8217; observability needs today. And with it, auto-instrumentation has turned out to be a powerful means to implement the same.</p><p>But with great power comes great responsibility.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oEXb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oEXb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 424w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 848w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1272w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" width="1456" height="973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1419403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oEXb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 424w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 848w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1272w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While auto-instrumentation provides a strong baseline, its out-of-the-box (or magical?) nature often produces a telemetry surplus and is a double-edged sword. Because auto-instrumentation is designed to be comprehensive by default, it captures <em>everything</em> in case you need it. Without intentional refinement, this can dilute your signal-to-noise ratio, leading to the generation of <em>surplus telemetry</em> that can inflate storage costs while burying actionable insights under a heap of low-value signals.</p><p>While certain types of <em>telemetry surplus</em> are tied to specific libraries, such as HTTP or gRPC, most <em>telemetry waste</em> is a byproduct of the language runtime itself. To illustrate this, we will use Java in the context of the blog. That said, the lessons presented here aren&#8217;t isolated; the signals we&#8217;ll uncover are often common patterns across the broader landscape of modern frameworks.</p><p>This blog is an attempt to help you sieve out the diamonds (good telemetry) from the rocks (noisy telemetry)!</p><h2>Java Agent for Auto-instrumentation</h2><p>By simply attaching a Java agent at runtime, developers can capture traces, metrics, and logs without modifying a single line of application code. The Java agent runs in the same Java Virtual Machine (JVM) as the application, using bytecode manipulation libraries such as ByteBuddy to rewrite classes as they are loaded.</p><p>The Java agent automatically hooks into common frameworks such as Spring Boot, Tomcat, and JDBC drivers to inject span creation and context propagation logic. While effective, this process, as mentioned before, can result in the generation of <em>not-so-useful</em> telemetry data that can later bog down storage and cause issues. <br><br>Let&#8217;s discuss them in greater detail.</p><h2>The Defaults You Should Know About (and Might Want to Disable)</h2><p>I&#8217;ve curated a list of commonly seen (and publicly complained of) not-so-useful telemetry data, referred to as <em>telemetry surplus</em>. Let me introduce them one by one.</p><h3></h3><h3>#1. URL Path and target attributes</h3><p>&#8212; <em>not specific to Java</em></p><p>Another commonly missed issue is that auto-instrumentation for HTTP clients and servers often captures the full <code>http.url</code> or <code>http.target</code> attribute. If an application uses RESTful paths with unique IDs like <code>/api/users/12345</code>, every unique ID creates a new attribute value.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UyyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UyyO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 424w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 848w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1272w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png" width="1456" height="395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:395,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:109434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UyyO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 424w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 848w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1272w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This can be considered a waste because it prevents effective aggregation. Aggregation works by grouping similar data into the same bucket based on shared attributes. Hence, if we use a templated route like <code>/api/users/:id</code>, the system puts every &#8216;Get User&#8217; request into a single bucket thereby letting you accurately calculate the p99 latency for the entire &#8216;Get User&#8217; feature.</p><p>Hence, make a mental note to always use templated <code>http.route</code> rather than the raw path, which can result in millions of useless data points, aka wasteful telemetry.</p><h3>#2. Controller spans</h3><p>In frameworks like Spring MVC, auto-instrumentation by default creates multiple spans for a single web request. Some types of spans are,</p><ul><li><p>Server Span<strong> (</strong><code>SpanKind.Server</code><strong>):</strong> The parent span. It tracks the entire process, from when the request reaches your server to when the user receives a response.</p></li><li><p>Controller Span<strong> (</strong><code>SpanKind.Internal</code><strong>):</strong> A child span. It tracks only the time spent inside your <code>@Controller</code> method.</p></li><li><p>View Span<strong> (</strong><code>SpanKind.Internal</code><strong>):</strong> Another child span. It tracks how long it took to turn your data into a JavaServer Page (JSP).</p></li></ul><p>The obvious catch is that in modern micro-services, controllers are often very thin, and they just immediately call a Service or a Database. If your database call is already being tracked, having a separate span that says the c<em>ontroller took 2ms</em> adds very little value. That is, for most cases, you might not need spans that capture controller and/or view execution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NtUT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NtUT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 424w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 848w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1272w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png" width="1456" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b493ec24-14eb-4cb1-8961-225784067e42_1687x904.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:502571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NtUT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 424w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 848w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1272w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The good news is that you can effectively suppress the generation of these spans by using <em>experimental flags</em>. Here are some flags that let you achieve the desired effect, as mentioned in <strong><a href="https://opentelemetry.io/docs/zero-code/java/agent/disable/#suppressing-controller-andor-view-spans">OpenTelemetry documentation</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hUoc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hUoc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 424w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 848w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1272w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png" width="1370" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a012c843-25b1-4408-83a3-be7c6446398e_1370x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1370,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90040,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hUoc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 424w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 848w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1272w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>#3. Thread name in run-time telemetry</h3><p>A major source of high-cardinality data occurs in Java runtime metrics, like <code>jvm.network.io</code> or <code>jvm.memory.allocation</code>. Versions 2.10.0, 2.11.0, and 2.13.1 of the agent included the <code>thread.name</code> attribute by default in these metrics. In environments that use large thread pools or virtual threads, this creates an unbounded number of unique time series, potentially leading to a <strong><a href="https://www.reddit.com/r/sre/comments/1k4h2wi/cardinality_explosion_explained/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">cardinality explosion</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KxVp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KxVp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 424w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 848w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1272w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png" width="970" height="468" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:468,&quot;width&quot;:970,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71582,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!KxVp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 424w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 848w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1272w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This issue was later corrected; maintainers removed the attribute from default metrics starting with version 2.18.0 (via <a href="https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/13407">PR #14061</a>). So, if you are using a previous version make sure you have set proper guardrails or bump up to a later version.</p><h3>#4. Duplicate Library Instrumentation</h3><p>This is an interesting dilemma.</p><p>Let&#8217;s first dissect the problem at hand. So, usually Java agents automatically attach to every supported library they find in our class path (of the application/ project) and end up instrumenting multiple layers of the same outgoing request.</p><p>Let me break this down with an example.</p><p>In modern Java development, we rarely use a low-level library directly. Instead, we use high-level SDKs. For example:</p><ol><li><p>Application Code calls the AWS SDK to upload a file to S3.</p></li><li><p>The AWS SDK (high-level) uses Apache HttpClient (mid-level) to execute the request.</p></li><li><p>Apache HttpClient uses Java Networking (low-level) to send bytes over the wire.</p></li></ol><p>Now, the Java Agent would see all three layers and create three separate spans for the same single logical operation. This results in nested spans that describe the same work, effectively doubling or tripling the telemetry volume for every outbound call.</p><p>To prevent this, the OpenTelemetry Java Agent suggests using a Span Suppression Strategy. This logic detects when an instrumentation point is already wrapped by another instrumentation point higher up the call stack.</p><p>The behaviour is controlled by the following property: <code>otel.instrumentation.experimental.span-suppression-strategy</code></p><p>There are three primary strategies used to decide which spans to keep and which to discard. You can read more about that <strong><a href="https://opentelemetry.io/docs/zero-code/java/agent/disable/#instrumentation-span-suppression-behavior">here</a></strong>.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for more resources that help you save observability costs!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>#5. Resource attributes</h3><p>Auto-instrumentation detectors for Kubernetes and host metrics often capture dynamic, unique identifiers by default, such as <code>container.id</code>, <code>k8s.pod.uid</code>, or <code>process.pid</code>. When these are attached to metrics (specifically), they create a new time series for every single container restart or process launch. This tampers with aggregation, and the metrics database is flooded with thousands of dead time series, increasing storage costs and significantly slowing down query performance for long-term trends, adding to telemetry surplus.</p><h3>#6. JDBC and Kafka Internal Signals</h3><p>Certain auto-instrumentation modules are inherently chatty, generating high-frequency spans for internal mechanics that carry little diagnostic value.</p><p>For example, the jdbc-datasource module (now often disabled by default) creates a span every time a connection is retrieved from a pool via <code>getConnection()</code>, resulting in thousands of entries that merely confirm the pool is functional.</p><p>Similarly, Kafka instrumentation can produce excessive spans for background heartbeats and metadata checks.</p><p>To mitigate this noise, these specific modules can be disabled upstream by setting <code>-Dotel.instrumentation.jdbc-datasource.enabled=false</code> or <code>-Dotel.instrumentation.kafka.enabled=false</code>, OR filtered downstream in the Collector to drop specific span names like poll or heartbeat, depending on the greater architecture of your application.</p><h3>#7. Scheduler and Periodic Jobs</h3><p>&#8212; <em>can be broadly applied to schedulers and jobs in different languages</em></p><p>Applications using Spring Scheduling or Quartz for background tasks like polling a database or checking a cache every second generate a span for every single execution. If a job runs once per second but does nothing interesting 99% of the time, it creates 86,400 successful but meaningless spans per day. This qualifies as telemetry waste in most cases.</p><p>You can disable the generation of these scheduler spans by using the system properties listed below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ppXe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ppXe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 424w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 848w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1272w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png" width="1328" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:1328,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ppXe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 424w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 848w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1272w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>#8. SDK Misalignment</strong></h3><p>Another massive source of enterprise surplus occurs when a framework like <strong><a href="https://trino.io/">Trino</a> </strong>initialises its own internal OpenTelemetry SDK instance instead of joining the global instance provided by the Java agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JRVs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JRVs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 424w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 848w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1272w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png" width="984" height="499" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:984,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JRVs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 424w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 848w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1272w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This results in two parallel telemetry pipelines running in one JVM, doubling memory overhead and network traffic. Because the instances are separate, the valuable business spans from the framework, which often misses the agent&#8217;s auto-detected resource attributes like Kubernetes namespace, making the data invisible to standard production queries and hence becomes telemetry surplus.</p><h2>Mitigation Strategies &#128658;</h2><p>Now that we have seen several ways your application could generate telemetry, this section provides a broad overview of how you can mitigate the resulting waste. As they say, prevention is better than cure; generating less telemetry surplus is the best way to eliminate it, but in most cases, it&#8217;s almost inevitable, and it&#8217;s important to learn how to mitigate it.</p><p>Mitigating telemetry waste requires a smart combination of upstream prevention and downstream pruning. For upstream, the most effective defence is selective enablement. By disabling the default <em>capture everything</em> behaviour and re-enabling only critical modules, while specifically suppressing known chatty modules or experimental controller spans as mentioned in the sections above. Downstream, where the telemetry meets the collector, it serves as a powerful filter using the processor  to delete redundant resource keys and employing tail sampling to keep 100% of error traces, while sampling only a tiny fraction of successful, low-signal traffic can reduce data volume without sacrificing diagnostic efficacy.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! We have more amazing content planned, with tips to manage your OTel systems better!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[6 Things I Learned About OpenTelemetry Contribution (That the Docs Won't Tell You)]]></title><description><![CDATA[If you&#8217;ve been waiting for a sign to start or restart contributing to OTel, this is it! &#128150; &#10024;]]></description><link>https://newsletter.signoz.io/p/6-things-i-learned-about-opentelemetry</link><guid isPermaLink="false">https://newsletter.signoz.io/p/6-things-i-learned-about-opentelemetry</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Tue, 20 Jan 2026 13:31:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G3LG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This blog took 6 days and 7 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p>Contributing to open-source can be overwhelming at first, and it&#8217;s okay to feel a little lost when trying to navigate your way through it. OpenTelemetry is one such open-source project under the CNCF [the second-largest, to be precise].</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G3LG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G3LG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 424w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 848w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1272w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" width="1443" height="961" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:961,&quot;width&quot;:1443,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1986031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G3LG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 424w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 848w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1272w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With over 200 contributors, it&#8217;s doing really well and is growing fast. I&#8217;ve been part of the community [and advocating its adoption!] for a while and see many people who wish to contribute to the project asking for tips, guidance, and direction in the Slack channels. There isn&#8217;t a lack of resources in this aspect, but it could be a bit scattered across a dozen different repos and docs. This blog is an attempt to bring all the resources you need to get started in one place, in a capsule.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Ita!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ita!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png" width="847" height="179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5b90162-128c-4cfb-b824-d49d817046ef_847x179.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:179,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Ita!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yi8U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 424w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 848w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1272w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png" width="909" height="144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:144,&quot;width&quot;:909,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 424w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 848w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1272w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JPnX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JPnX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png" width="847" height="179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:179,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JPnX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Some snippets of folks introducing themselves</figcaption></figure></div><p>I&#8217;ve been following <strong><a href="https://www.linkedin.com/in/diana-todea-b2a79968/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BcmDxyM3CSomEXgWtZHCWRg%3D%3D">Diana</a></strong> <strong><a href="https://www.linkedin.com/in/diana-todea-b2a79968/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BcmDxyM3CSomEXgWtZHCWRg%3D%3D">Todea&#8217;s</a></strong> journey very closely for a while, and recently she won the OpenTelemetry Community Awards at KubeCon, NA 2025. So my obvious next step was to hop on a call with her and gather as many insights as I could! Between our conversation and her own <strong><a href="https://medium.com/@dianatodea/the-unofficial-guide-to-contributing-to-opentelemetry-where-to-look-and-who-to-talk-to-9de04ae75fe0">recent writings</a></strong><a href="https://medium.com/@dianatodea/the-unofficial-guide-to-contributing-to-opentelemetry-where-to-look-and-who-to-talk-to-9de04ae75fe0">,</a> I&#8217;ve distilled the best insights on how you can move from a lurker to a contributor.</p><p>I am also addressing a problem here: While many folks want to contribute, there is a shortage of folks who actually make it to their first PR, and even fewer who consistently continue to contribute and stay active. I&#8217;m writing this to address both hurdles, helping you get started and find a reason to stay.</p><p>So, if you&#8217;ve been waiting for a sign to start or restart, this is it. &#128150; &#10024;</p><p></p><h2>#1. What&#8217;s the first step I should take?</h2><p>Kudos to you for taking the first leap. You can start by joining the <strong><a href="https://cloud-native.slack.com/ssb/redirect">CNCF Slack</a> </strong>channel [of which OTel is a part] and come say hi in the #hallway channel. Here are some examples.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fPc-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fPc-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 424w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 848w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1272w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png" width="925" height="121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d688412-41d4-4402-a279-d6b360c81b39_925x121.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:121,&quot;width&quot;:925,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fPc-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 424w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 848w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1272w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ueod!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ueod!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 424w, https://substackcdn.com/image/fetch/$s_!ueod!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 848w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1272w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png" width="927" height="118" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c608fc69-ab56-448c-9ad1-1def44918e11_927x118.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:118,&quot;width&quot;:927,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24066,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ueod!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 424w, https://substackcdn.com/image/fetch/$s_!ueod!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 848w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1272w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Snippets of folks saying hi in #hallway</figcaption></figure></div><p>You can also follow suit and introduce yourself. Next, since contribution is the primary goal, here&#8217;s the <strong><a href="https://opentelemetry.io/docs/contributing/">official documentation</a></strong> outlining key aspects you should know. The next step you could take is try finding a good first issue from this <strong><a href="https://github.com/open-telemetry/opentelemetry.io/issues?q=is:issue+is%3Aopen&amp;%2343;sort%3Aupdated-desc&amp;%2343;label%3A%22good+first+issue%22">list</a></strong>.</p><p></p><blockquote><p><em>You can also check out <strong><a href="https://clotributor.dev/?source=post_page-----9de04ae75fe0---------------------------------------">CLOtributor</a>,</strong> which helps you find good first issues across a number of Cloud Native projects. Here are some channels you can join initially. [as per Diana&#8217;s blog]</em></p><p><em>#otel-sig-end-user, #otel-devex, #opentelemetry-new-contributors, #otel-contributor-experience, #otel-docs-localization</em></p></blockquote><p>But now you could run into your first dilemma. Let&#8217;s see how to get over it.</p><p></p><h2>#2. I can&#8217;t find a good first issue, wtd<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> ?</h2><p>Finding a good first issue is indeed a task in its own. Most of them could already have been picked up by someone, and there could be active discussions around them. Because these issues are beginner-friendly, they are highly competitive and often claimed within hours of posting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WlTA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WlTA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 424w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 848w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1272w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png" width="1456" height="601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WlTA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 424w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 848w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1272w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">See, most good first issues could already be assigned and actively worked on</figcaption></figure></div><p>If this is the case, you can shift your strategy. While these issues are great for a quick win, they rarely help you build the rapport/ relationships or architectural understanding necessary for long-term contribution. This is why the Special Interest Group [SIG] model of OpenTelemetry Community is so important.</p><p>You can always start small, by being an active part of the community, including SIG calls and discussions in the corresponding channels, and by trying to make yourself useful with ad hoc tasks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to get more content on observability and OpenTelemetry delivered to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>#3. I made a PR, not getting any reviews, wtd?</h2><p>Give it some time.</p><p>Most maintainers have a day job in addition to maintaining the project, so small delays can occur. You can always post a message in the corresponding Slack channel with enough context so that anyone can pick up the review task. Here&#8217;s an example.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dkoi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 424w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 848w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png" width="1456" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:141289,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 424w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 848w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Ideal way to ask for reviews</figcaption></figure></div><p></p><h2>#4. I want to contribute, but non-technically, wtd?</h2><p>The good news is you are in high demand!</p><p>There&#8217;s a lot of work that is not coding that could use a lot more hands on deck. Some means of contribution include documentation and blogs. You can find ways to improvise existing documentation or add new ones. You can also join the End User Working Group [EUWG]. They are constantly looking for people to share implementation stories, conduct user interviews, or improve feedback loops between vendors and users. The Merge Forward is another initiative that focuses on diversity and inclusion, and needs allies to help run mentorship programs and community events.</p><p>If you&#8217;re the kind of person who likes helping others, you can contribute by simply being active in forums or Slack to answer questions from newer users. Helping troubleshoot issues or explaining concepts in the Slack channels or GitHub discussions is a valuable form of contribution, too. So, by being a friendly helper in the community, you&#8217;re contributing to the project&#8217;s success, and you might build a reputation for yourself along the way.</p><p>If you&#8217;re interested in the process side of things, OpenTelemetry, being a pretty huge project, has many SIG meetings, public notes, and release planning. You could volunteer to help with note-taking in a SIG meeting, or assist in organising community events like the OpenTelemetry Community Day at KubeCon. The <strong><a href="https://www.notion.so/6-2e9fcc6bcd1980dfb2a8cb1902f58745?pvs=21">Contributor Experience SIG</a></strong> focuses on improving the project for contributors; they might have initiatives you can join, even if you&#8217;re not contributing code.</p><p>Another piece of good news is that you can always switch tracks or do both code and non-code contributions. In our call, Diana emphasised that a contributor&#8217;s journey can be very fluid; you might start with documentation because that&#8217;s what you&#8217;re comfortable with, and later move into code as you learn more, or vice versa. The path you choose initially doesn&#8217;t lock you in; all contributions count, and in a project as broad as OpenTelemetry, there is a need for a diverse set of skills, which can be the best launchpad for you [if you utilise them well!].</p><p></p><h2>#5. How to contribute actively and remain consistent?</h2><p>Here&#8217;s something harder than getting your first PR merged. Staying consistent and active in the community. Many, many people drop off after a couple of contributions. Here&#8217;s when consistency and discipline come into the picture, much like hitting the gym &#128517;.</p><p>Consistency in open source comes from aligning your contributions with what genuinely excites you. You have options to choose from, ranging from whether it&#8217;s a SIG you&#8217;re passionate about or a specific skill you want to grow. Set a realistic routine, such as contributing weekly or monthly, and stay connected by attending SIG meetings, tracking GitHub updates, or staying active in Slack.</p><p>You can stay in the loop by attending the bi-weekly SIG meetings for your area, even if just as a listener at first, or by joining community calls.</p><p>As Diana puts it, when something triggers you and helps you learn, it becomes easier to show up consistently and enjoy the journey. And, like everything else in life, motivation is intrinsic and should come from within. &#129496;&#8205;&#9792;&#65039;</p><p></p><h2>#6. Ok, but what do I get out of this?</h2><p>Trick question.</p><p>Contributing to OpenTelemetry or any open source project, for that matter, is indeed an investment of your time and effort. The good news is the ROI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> can be huge, both personally and professionally. </p><p>Today, OpenTelemetry sits at the forefront of observability. By contributing, you&#8217;ll gain a much deeper understanding of how instrumentation, tracing, metrics, and related technologies work under the hood. As you debug issues or implement features, you&#8217;ll inevitably learn a ton about distributed systems, telemetry data, and best practices in cloud-native architectures from the industry&#8217;s best people! You&#8217;ll also be interacting with engineers from many companies [since OpenTelemetry has contributors from Lightstep, Google, SigNoz, and dozens of organisations]. These connections can lead to job opportunities or collaborations in the future. Many contributors find that being active in open source eventually opens multiple doors.</p><p>Many people also contribute out of a passion for the technology and the ethos of open source; if you&#8217;ve benefited from free software, there&#8217;s a gratifying element of paying it forward. That motivation can be very fulfilling in itself.</p><p></p><h2>Some areas that could use more help</h2><p>OpenTelemetry is a broad project with many moving parts, and naturally, some parts of it have more active contributors than others. If you&#8217;re looking to make a real impact and perhaps have an easier time finding issues to tackle, it helps to know which areas are currently under-resourced. Based on community insights and what maintainers have pointed out, here are a few areas in need of more contributors:</p><ul><li><p><strong><a href="https://opentelemetry.io/docs/contributing/localization/">Documentation Localisation</a></strong>: As Diana mentioned, translating docs is a major need. Some language communities, like Japanese and Chinese, have been very active in translating OpenTelemetry docs, but others have barely started. If you are fluent in any language besides English, you can make a big difference by contributing to localisation efforts.</p></li><li><p><strong>Language SDKs with smaller teams:</strong> OpenTelemetry maintains SDKs for many languages. Some of these, especially the most popular languages, have large contributor teams, but others could use help. For example, newer or less common language implementations might have only a couple of maintainers. If you happen to know a language like PHP, Ruby, Erlang, or Rust, those SDKs might appreciate extra contributors to help fix bugs and implement new features to catch up with the latest spec.</p></li><li><p><strong>eBPF Instrumentation [OBI]:</strong> One of the newer frontiers in OpenTelemetry is the eBPF auto-instrumentation a.k.a. OBI. This allows automatic telemetry data capture at the kernel level without modifying application code. If you&#8217;re interested in low-level programming or Linux kernel tech, the OBI project would love some help!</p></li></ul><p>Being part of the community and taking on responsibilities is as simple as sending an intro message to any SIG or channel you&#8217;re particularly interested in and asking if you can help out with anything. It can be as easy as the screenshot below! Thanks to the amazing community made even more welcoming by the great people in it!</p><p>So go, and make a change!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RMzh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RMzh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 424w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 848w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1272w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png" width="451" height="391" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac39a019-c300-45d0-a11d-10021151ffda_451x391.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:391,&quot;width&quot;:451,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RMzh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 424w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 848w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1272w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How I started!</figcaption></figure></div><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you want more great insights on observability and beyond, hit subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>wtd: just an acronym for &#8216;what to do?&#8217; much like this emoji &#129335;&#8205;&#9792;&#65039;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>ROI: Return On Investment</p></div></div>]]></content:encoded></item><item><title><![CDATA[BTS of OpenTelemetry Auto-instrumentation]]></title><description><![CDATA[OpenTelemetry&#8217;s auto-instrumentation toolkit boils down to a couple of clever techniques that make all of this possible. Let's discuss them!]]></description><link>https://newsletter.signoz.io/p/bts-of-opentelemetry-auto-instrumentation</link><guid isPermaLink="false">https://newsletter.signoz.io/p/bts-of-opentelemetry-auto-instrumentation</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 10 Jan 2026 15:38:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IKwd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while. <br></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p></blockquote><p></p><p>I&#8217;ve been an OpenTelemetry advocate for over a year and have written many, many blogs on adopting OpenTelemetry in your systems to achieve deep observability. Yet, I&#8217;ve always wondered how and what actually happens behind the scenes, in the context of auto-instrumentation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IKwd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IKwd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1322370,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/184125533?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IKwd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>So, this is me breaking down what happens under the hood of OpenTelemetry for you.</p><h2>Refresher on Auto-instrumentation</h2><p>For those who are new to the space, auto-instrumentation refers to collecting telemetry [traces, metrics, logs] from your application without requiring you to make changes to the application code. You can read more about it from the<strong><a href="https://opentelemetry.io/docs/concepts/instrumentation/zero-code/"> official docs </a></strong>here.</p><p>A helpful way to understand how this works is to separate the OpenTelemetry API from the OpenTelemetry SDK.</p><ul><li><p>The OTel API is the interface for creating telemetry &#8212; &#8220;start a span&#8221;, &#8220;add an event&#8221;, &#8220;record a metric&#8221;, &#8220;propagate context&#8221;, etc. Both manual instrumentation [your code] and auto-instrumentatio<strong>n</strong> [instrumentation libraries/agents] ultimately use these same API calls. But in auto-instrumentation, it&#8217;s taken care of for you automatically.</p></li><li><p>The OTel SDK is the implementation behind the API &#8212; it decides what actually happens to that telemetry [sampling, batching, processing] and where it goes [exporting].</p></li></ul><p>So auto-instrumentation is typically achieved in two parts,</p><ul><li><p>Instrumentation hooks [libraries/agents] that wrap existing functions and call the OTel API at the right points.</p></li><li><p>SDK configuration that ensures those API calls actually record telemetry and can be exported.</p></li></ul><p>In Auto-instrumentation, OpenTelemetry wraps existing function implementations and extracts useful data, such as function parameters, execution duration, and results. It&#8217;s important to note that the way this wrapping and hooking is done varies widely across programming languages. Broadly, we can say that there&#8217;s a clear difference between how it works in dynamic languages [like JavaScript, Python, and Ruby] versus statically-typed or compiled languages [like Java, Go, and .NET].</p><p>Let&#8217;s dive into those differences (or similarities!) next.</p><h2>Dynamic vs. Static Languages</h2><p>It becomes easier to understand what happens behind the scenes when classifying the languages broadly into dynamic and static. Dynamic languages allow instrumentation to patch or wrap functions at runtime easily, whereas static languages, on the other hand, don&#8217;t natively allow such runtime patching, so they require different techniques to insert instrumentation code. That is, most dynamic languages like Python, JavaScript, and Ruby, which are more flexible at run-time, depend on methods like monkey-patching to implement auto-instrumentation. While other static languages or those that run on virtual machines like Go or C rely on techniques like build-time injection for the same.</p><h2>Some Cool Techniques</h2><p>OpenTelemetry&#8217;s auto-instrumentation toolkit boils down to a couple of clever techniques that make all of this possible. Let&#8217;s discuss two of the most common methods used under the hood.</p><p></p><h3>Monkey Patching</h3><p>The lore behind the term <em>monkey-patching</em> fascinated me. Apparently, the word&#8217;s etymology comes from <em>guerrilla-patching</em>, which refers to the sneaky act of changing code at runtime to fix a bug or add a feature without altering the original source code. Because <em>guerrilla</em> and <em>gorilla</em> are near-homophones, the term was intentionally used as a pun, <em>gorilla-patch</em>. Eventually, developers who wrote their patches more carefully began calling them <em>monkey-patches</em> to make the process sound less intimidating than a <em>gorilla</em>.</p><p>Okay, now let&#8217;s get back to the engineering. In dynamic languages such as Python and Node.js, functions and modules are treated as first-class objects that reside in mutable memory structures. This allows OpenTelemetry to employ monkey patching, a technique where existing functions are replaced with instrumented wrappers at runtime.</p><p>The concept is straightforward, at runtime, we replace existing functions with instrumented versions that inject telemetry before and after calling the original function.</p><p>This piece of code roughly illustrates what happens in Node.js.</p><pre><code><code>const originalFunction = exports.functionName;

function instrumentedFunction(...args) {
  const startTime = process.hrtime.bigint();
  // invoke the OG function here
  const result = originalFunction.apply(this, args);
  const duration = process.hrtime.bigint() - startTime;
  console.log(`functionName(${args[0]}) took ${duration} nanoseconds`);
  return result;
}

exports.functionName = instrumentedFunction;
</code></code></pre><p>OTel JavaScript uses a package called <code>require-in-the-middle</code> to intercept module loading and apply such patches before your code runs.</p><p>Let&#8217;s see how this could work in Python. Say we are trying to collect data from an HTTP client, like requests. Python&#8217;s requests lib, exposes a separate function for each HTTP method [<code>requests.get</code> / <code>requests.post</code> / <code>requests.put</code>, and so on]. But each of these functions eventually calls an internal request method, whose parameters are the method, the URL, and all the kwargs. The function then returns a response object.</p><p>Let&#8217;s see what this looks like pseudo-code-wise:</p><pre><code><code>def request(method, url, **kwargs):
&#9;# Original implementation

def wrapped_request(method, url, **kwargs):
&#9;before = datetime.now()
&#9;# Call the original implementation
&#9;response = request(method, url, **kwargs)
&#9;# Collect the necessary information
&#9;duration = datetime.now() - before
&#9;collect_data(method, url, response.status_code, duration)
&#9;# Return the value from the original call
&#9;return response

</code></code></pre><p>To close the loop, the original function implementation needs to be replaced with the new <code>wrapped_request</code>. For dynamic languages, this is done by simply holding a reference to the original implementation and replacing the function with its name. A pseudocode implementation [which isn&#8217;t very, very far from a real life code] looks like this:</p><pre><code><code>original_request_impl = requests.request

def wrapped_request(method, url, **kwargs):
&#9;# Wrapped implementation as appears, has the original call
&#9;# As shown in the previous snippet

requests.request = wrapped_request

</code></code></pre><p>Calling these requests won&#8217;t result in any observable change, albeit the auto-instrumentation will keep collecting necessary data.</p><h3>Byte-code Instrumentation</h3><p>This is the underlying technique for languages that run on a virtual machine. Instead of modifying functions at the language level, this approach modifies the compiled code [bytecode] as it&#8217;s being loaded into the runtime. Essentially, the instrumentation injects extra bytecode instructions that call OpenTelemetry APIs around the target method&#8217;s original instructions.</p><p>In the <s>Jurassic</s> Java world, this is done via a special agent. When you run a Java app with the <code>-javaagent </code>flag pointing to the OpenTelemetry Java Agent JAR, the JVM invokes the agent&#8217;s <code>premain()</code> method before anything else.</p><pre><code><code>public static void premain(String args, Instrumentation inst) {
    new AgentBuilder.Default()
        .type(ElementMatchers.nameStartsWith("com.example.TargetApp"))
        .transform((builder, typeDescription, classLoader, module, protectionDomain) -&gt;
            builder.method(ElementMatchers.named("targetMethod"))
                   .intercept(MethodDelegation.to(MethodInterceptor.class))
        ).installOn(inst);
}
</code></code></pre><p>In that <code>premain()</code>, OTel registers a class transformer [as seen in the snippet] with the JVM. As each class loads, the transformer can inspect it and, if it matches one of the known libraries or functions we want to instrument [e.g., a Servlet filter, a JDBC call, etc.], the agent will modify the class&#8217;s bytecode on the fly to insert the telemetry hooks. The end result is that by the time your application&#8217;s code runs those functions, they already have tracing logic woven in.</p><p>Bytecode instrumentation is extremely powerful because it works at the Java virtual machine [JVM] level, making it language-agnostic within the JVM ecosystem. It can instrument Java, Kotlin, Scala, and other JVM languages without any modification.</p><p>The trade-off is a bit more complexity and setup &#8212; you need to run the app with the agent [or enable the profiler], and there is some startup overhead to transform classes. Once running, the performance impact of the injected code is usually minimal. Overall, this technique lets OpenTelemetry achieve deep, broad instrumentation of popular frameworks in Java and .NET with near-zero friction for the developer.</p><h3>Abstract Syntax Tree Modification</h3><p>Unlike Python, which is a dynamic language and Java, which is a kind of static language that runs in the VM, Go is a static language that does not use a VM, making it an outlier in this case. In Go, auto-instrumentation works by modifying the Abstract Syntax Trees [ASTs].</p><p>It was in the Compiler Design Course of my undergrad degree when I first got introduced to ASTs. It&#8217;s primarily a data structure widely used in compilers to represent program code. An AST is usually the result of the syntax analysis phase of a compiler. This is exactly where the auto-instrumentation comes into the picture as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rPDn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rPDn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 424w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 848w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1272w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png" width="451" height="315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:315,&quot;width&quot;:451,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/184125533?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rPDn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 424w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 848w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1272w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of an abstract syntax tree (a very, very small one)</figcaption></figure></div><p></p><p>The auto-instrumentation process of Go involves parsing the source code into an AST, adding instrumentation code to the tree, and generating the modified source code before compilation. This approach ensures that the instrumentation is incorporated in the final binary, providing zero runtime overhead for the instrumentation mechanism itself. But it does come with trade-offs, including the need for access to source code, which makes it difficult to instrument third-party libraries and plugins, and the need for complex changes to build pipelines.</p><h2>Final Words</h2><p>Delving into how OpenTelemetry auto-instrumentation works behind the scenes reveals a lot of clever engineering. The mechanisms that we learnt above allow OTel to hook into your application&#8217;s execution, gather context and timing information, and funnel it into the OTel SDK, all without you changing your application code. &#128522;</p><p>As an OpenTelemetry user, you don&#8217;t usually need to worry about these details, but understanding them can be helpful when you are instrumenting</p><p>In the end, what feels like telemetry appearing out of thin air, aka auto-instrumentation, is actually the result of these well-orchestrated techniques. Knowing this, you can better appreciate the work done by the OTel community and troubleshoot issues with a deeper intuition.</p><p>Happy instrumenting!</p><p></p><blockquote><p><em>On another note, SigNoz along with InKeep is <strong><a href="https://luma.com/f2t9hnia">hosting a webinar</a></strong> on Debugging AI Agents: Observability Best Practices with Inkeep &amp; SigNoz. Check out if it is something that interests you!</em></p><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to stay tuned for more observability related content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Reducing OpenTelemetry Bundle Size in Browser Frontend]]></title><description><![CDATA[But here&#8217;s the thing, neglecting observability for reducing bundle size isn&#8217;t a good trade-off.]]></description><link>https://newsletter.signoz.io/p/reducing-opentelemetry-bundle-size</link><guid isPermaLink="false">https://newsletter.signoz.io/p/reducing-opentelemetry-bundle-size</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 20 Dec 2025 13:45:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xq4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>In honour of the Stranger Things finale, I&#8217;ve hidden a few easter eggs throughout this edition for all the fellow fans out there. See if you can spot them all! As this is our final edition of the year, I want to wish all of my readers a very happy holiday season, a joyous Christmas, and a wonderful new year.</em></p><p><em>Cheers.</em></p></blockquote><p></p><p>When I was building applications, I used to always rely on the DevTools console of my web browser to examine logs in the frontend. But, with UI log messages only being accessible within your browser rather than forwarded to a file somewhere, which is the common pattern with backend services, losing visibility of this resource when triaging user issues was a real dilemma. Since adding any kind of monitoring/ observability solution would blow up the bundle size, I&#8217;d try to avoid it as much as possible.</p><p>But here&#8217;s the thing, neglecting observability for reducing bundle size isn&#8217;t a good trade-off. There are several other ways for you to run up that hill, and meanwhile, if you are caught in a scenario where your requests are not being sent, and the site is crashing and everything is turning upside down, you&#8217;ll have to inevitably start looking inside.</p><p>Inside your traces, spans and contexts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xq4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xq4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 424w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 848w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1272w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" width="1456" height="1459" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1459,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4324615,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/182164862?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xq4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 424w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 848w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1272w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In this blog, we explore strategies to trim the bundle impact of OTel, focusing on tree-shaking [removing unused code] and lazy-loading [deferring loading until needed] and how to apply these in different frameworks.</p><p></p><h2>Impact of OpenTelemetry on Bundle Size and Performance</h2><p>Out of the box, adding OpenTelemetry&#8217;s web libraries can introduce quite a significant amount of JavaScript. For example, the official browser auto-instrumentation bundle was about <strong>300 KB uncompressed [~60 KB gzipped]</strong> after recent optimisations, which is in the same ballpark as many third-party RUM [Real User Monitoring] agents. While 60 KB may seem <em>okay-ish</em>, loading and executing this script during initial page load can <strong>delay rendering</strong>. A large script can increase <strong>blocking time</strong>, potentially pushing out LCP [Largest Contentful Paint &#8212; the render of the largest element] beyond the optimal 2.5s threshold.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s061!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s061!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 424w, https://substackcdn.com/image/fetch/$s_!s061!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 848w, https://substackcdn.com/image/fetch/$s_!s061!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1272w, https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png" width="883" height="181" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:181,&quot;width&quot;:883,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/182164862?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s061!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 424w, https://substackcdn.com/image/fetch/$s_!s061!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 848w, https://substackcdn.com/image/fetch/$s_!s061!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1272w, https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Not opting for OTel due to heavy bundle-size</figcaption></figure></div><p>Core Web Vitals are very sensitive to any render-blocking resources. We generally avoid deferring critical content, but telemetry scripts are <em>not</em> user-facing content. In fact, web performance guidelines note you should <em>not</em> lazy-load an LCP image [since that delays visible content]; however, lazy-loading a telemetry script is a good practice precisely because it&#8217;s non-essential to the user&#8217;s immediate experience. The challenge is finding a balance: we want to collect telemetry [traces of page loads, API calls, user interactions, metrics like Web Vitals, etc.] for observability, <em>but</em> we must prevent the OTel code from slowing down the page. We will look at two proven techniques &#8212; tree-shaking and lazy-loading to reduce bundle bloat.</p><p></p><h2>Tree-Shaking &#127795; OpenTelemetry Code</h2><p>Tree-shaking is a build optimisation that removes dead code, including modules or functions that your application doesn&#8217;t actually use. OpenTelemetry&#8217;s JavaScript SDK is modular, which means if you import only certain parts [say, the tracing API and one exporter], you <em>should</em> be able to exclude others [like metrics, logging, or unused instrumentations]. Ensuring that tree-shaking works with OTel involves a few considerations:</p><h3><strong>Use Modern ESM Imports</strong></h3><p>All OTel packages support <a href="https://www.w3schools.com/nodejs/nodejs_modules_esm.asp">ES Modules</a>. Import only the symbols you need, rather than entire libraries. For example, if you only need the web tracer and the OTLP exporter, you might do:</p><pre><code><code>import {WebTracerProvider }from'@opentelemetry/sdk-trace-web';
import {BatchSpanProcessor }from'@opentelemetry/sdk-trace-base';
import {OTLPTraceExporter }from'@opentelemetry/exporter-trace-otlp-http';
</code></code></pre><p>This pulls in only tracing-related code and the OTLP trace exporter, leaving out metrics and logging code.</p><h3><strong>Avoid Catch-all Imports or Meta-Packages</strong></h3><p>OpenTelemetry offers auto-instrumentation packages that conveniently bundle many instrumentations. For example, <code>@opentelemetry/auto-instrumentations-web</code> will include document load, fetch/XHR, user interaction, and more. If you use it, your bundle will include <em>all</em> <em>those instrumentations</em>. To keep things slim, <em>only import the instrumentations you actually want</em> individually, instead of a blanket import. This way, unused ones can be dropped.</p><p>In code, that means doing something like:</p><pre><code><code>import {DocumentLoadInstrumentation }from'@opentelemetry/instrumentation-document-load';
import {FetchInstrumentation }from'@opentelemetry/instrumentation-fetch';
// ... then use these in registerInstrumentations ...
</code></code></pre><p>If you don&#8217;t need, say, user interaction tracking or certain network instrumentation, not importing them will ensure they don&#8217;t appear in the bundle.</p><h3><strong>Mark OTel Packages as Side-Effect-Free</strong></h3><p>Tree-shaking works best when libraries declare that they have no side effects on import. Many OTel packages now include <code>sideEffects: false</code> in their package.json, which helps Web-pack/Rollup know it can safely drop unused exports.</p><p>This was more of an issue in the previous versions. A user noted that manually adding <code>sideEffects: false</code> to OTel packages reduced bundle size by ~40KB, and the OTel maintainers addressed this in later releases. You can view the <a href="https://github.com/open-telemetry/opentelemetry-js/issues/2855">Github discussions</a> here. Using OpenTelemetry JS v1.2+ or v2.x is recommended, as newer versions have improved in this area. In fact, the OTel JS SDK 2.0 [released in 2025] explicitly removed certain patterns [like extensive classes or namespaces] to improve tree-shakability and minification.</p><p>Upgrading to the latest version can yield a smaller bundle thanks to these optimisations!</p><h3><strong>Consistent Versioning to Avoid Duplicates</strong></h3><p>One subtle cause of bundle bloat that often goes missed, is version mismatches. If you depend on multiple OTel packages that internally bring different versions of the core API, you might accidentally bundle two copies. Ensure all your OTel packages are on the same version so the bundler can deduplicate them. For instance, if everything is on version 1.5.0 except one package on 0.26.0, you may get two sets of code.</p><p>Aligning package versions will help prevent that scenario.</p><p>In summary, <em>tree-shake aggressively.</em> That means prune everything optional &#8212; disable features that aren&#8217;t useful anymore, drop instrumentations you don&#8217;t need, and let your bundler eliminate the dead code. By doing so, you minimise the impact on bundle size to a great extent.</p><p></p><h2>Lazy-Loading the OpenTelemetry SDK</h2><p>This is the next concept you can explore. Lazy-load the OTel code, so it isn&#8217;t even downloaded or executed until after the critical page content is loaded. This strategy has perhaps the biggest positive impact on LCP and initial load performance. The idea is to defer the initialisation of OpenTelemetry modules to a non-critical moment [for example, after the page&#8217;s main content is on screen or when the user interacts], rather than blocking the main thread early.</p><h3><strong>Dynamic </strong><code>import()</code><strong> in Single-Page Apps</strong></h3><p>In a React or other Single Page Application [SPA], you can use the dynamic <code>import()</code> function to load your telemetry setup code asynchronously.</p><p>For example, you might create a module <code>otel-init.js</code> that configures the OTel SDK, and <em>then</em> instead of importing it at the top of your app, you load it on demand. For instance:</p><pre><code><code>// In your main App component
useEffect(() =&gt; {
import('./otel-init').then(module =&gt; {
module.initTelemetry(); // call the initialization function exported here
  });
}, []);
</code></code></pre><p>This ensures that the OTel code [everything inside <code>otel-init</code> and its imports] is pulled in only <em>after</em> the first render. The UI can render, LCP can happen, and only then does the telemetry code load in the background. From the user&#8217;s perspective, the page appears quickly; from the app&#8217;s perspective, OTel starts slightly later.</p><h3><strong>Code-Splitting with Bundler Config</strong></h3><p>If you&#8217;re using Webpack, you can explicitly split OTel into its own chunk. For example, in an Angular app using Webpack, you can configure a separate cache group for <code>@opentelemetry</code> modules.</p><p>This means your build will produce something like <code>main.js</code> and <code>opentelemetry.js</code>. However, to truly lazy-load that chunk, you should ensure it&#8217;s not required immediately. In practice, that again means using dynamic import or a similar mechanism to load that chunk at a later time. The Webpack config might look like:</p><pre><code><code>// webpack.config excerpt
optimization: {
splitChunks: {
chunks:'all',
cacheGroups: {
opentelemetry: {
test:/[\\\\/]node_modules[\\\\/](@opentelemetry)[\\\\/]/,
name:'opentelemetry',
priority:10,
reuseExistingChunk:true,
      },
    },
  },
}
</code></code></pre><p>There&#8217;s a small trade-off here. Delaying the loading of OTel modules would also inevitably result in the loss of some early telemetry data. For example, if you want to capture any errors or events during the first few seconds, a delayed start misses them. If those are crucial, you might decide to load a minimal part of OTel early [or use a buffered logging approach] and load the rest later. It&#8217;s a balancing act.</p><p>Both of the above are proven techniques for bringing down bundle size. Apart from these, there are some more optimisations for how we send telemetry data from the browser and framework-specific techniques, which I&#8217;ll cover in another edition. Till then, adieu!</p><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How We Achieved 30% Faster Log Queries by Overcoming ClickHouse's Native JSON Limits]]></title><description><![CDATA[What started as an investigation into filtering inconsistent dot-key notation in JSON logs ended up optimising our query performance by 30%.]]></description><link>https://newsletter.signoz.io/p/overcoming-clickhouses-json-constraints</link><guid isPermaLink="false">https://newsletter.signoz.io/p/overcoming-clickhouses-json-constraints</guid><dc:creator><![CDATA[Piyush]]></dc:creator><pubDate>Sat, 13 Dec 2025 13:02:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uDTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This piece is written by <strong><a href="https://www.linkedin.com/in/piyushsingariya/">Piysuh</a></strong>, Software Engineer at SigNoz, who was also one of the key contributors for making this engineering overhaul happen. </em></p><p><em>Cheers.</em></p></blockquote><p></p><p></p><p>Customer logs data is always messy.</p><p>Being (and building!) an <a href="https://signoz.io/">observability platform</a>, we get to see <em>all the beautiful, creative ways</em> it can be messy, every single day. And yet, our customers expect, quite fairly, I might add, perfect query results and peak performance.</p><blockquote><p>SigNoz is an open-source observability platform that can be your one-stop solution for logs, metrics and traces. Using ClickHouse as a single datastore and built to support OpenTelemetry natively, SigNoz can help you troubleshoot issues faster with powerful querying capabilities on your observability data.</p></blockquote><p>We recently overhauled how we store JSON logs in ClickHouse [our datastore] to improve query performance and enable filtering of nested dot-notation keys, which was previously not possible. What started as an investigation into filtering inconsistent dot-key notation in JSON logs ended up optimising our query performance by 30%.</p><p>In the process, we developed a two-tier JSON storage model that helped us overcome the limitations of ClickHouse&#8217;s native JSON data type while paving the way for superior query and aggregation performance for any key in customers&#8217; logs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uDTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uDTw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 424w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 848w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" width="1456" height="1464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1464,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:528331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uDTw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 424w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 848w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>How We Used to Store and Query Log Data Earlier</strong></h2><p>Before this overhaul, we stored the raw log body as a simple <code>string</code> data type. While this was easy to ingest, it created some bottlenecks when developers tried to interact with the data.</p><h3>Slow Run-Time Parsing and the Impossible GROUP BY</h3><p>Storing the log body as a string meant the database had no way to instantly look up values inside the JSON whenever a filter is applied to log data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_WDw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_WDw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 424w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 848w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:342137,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_WDw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 424w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 848w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Applying log filters in SigNoz</figcaption></figure></div><p>Whenever a user entered a filter query like the one above, the entire log body [stored as a string] is dynamically converted to JSON at runtime. This led to slow query performance, especially when the selected time range was long or there was too much data to scan.</p><p>Currently, we address slower query performance with the help of <a href="https://signoz.io/docs/logs-pipelines/introduction/">log pipelines</a>. With log pipelines, users can transform their logs to suit their querying and aggregation needs before they are stored in the database.</p><p>Though log pipelines are helpful, they are not straightforward. To achieve the necessary performance, users had to manually implement a <em>log pipeline</em> to extract key-value pairs from the JSON string and store the extracted fields as separate attributes.</p><p>This is not a seamless out-of-the-box experience for users sending JSON logs.</p><h3>The Ambiguity of Dot Notation</h3><p>The final breaking point that spurred our full investigation was the ambiguity created by dot notation. Our query builder could not differentiate between logically different JSON structures when developers used dots for querying:</p><p><strong>Scenario 1: Key with Dot in Name:</strong></p><pre><code><code>{
  &#8220;user&#8221;: {
    &#8220;session.id&#8221;: &#8220;abc-1234&#8221;
  }
}
</code></code></pre><p><strong>Scenario 2: Nested JSON Structure:</strong></p><pre><code><code>{
  &#8220;user&#8221;: {
    &#8220;session&#8221;: {
      &#8220;id&#8221;: &#8220;xyz-5678&#8221;
    }
  }
}
</code></code></pre><p>Although both of these logs record the <em>same piece of information</em>, it is difficult to differentiate between the two when the user wants to run a query for them. The query needed to find the data in the first example will not work on the data from the second example, and vice versa.</p><p>And we needed something that works on both.</p><p>This means that when a user performs a search, they might get incomplete results, not realising that some data is being missed simply because of a formatting difference. We can&#8217;t expect our users to write separate, complex queries to find both formats or even perform a union to get the necessary data.</p><p>This was a really big pain point for us, and the ultimate trigger.</p><h2><strong>Normalising JSON logs in the Collector</strong></h2><p>Our first and most direct approach was to solve the problem before the data reached the database. The idea was to intercept incoming logs in our OpenTelemetry collector and transform the JSON structure <em>in-flight</em>.</p><p>The proposed solution was to have the collector inspect the keys of every incoming JSON object.</p><p>If a key contained a dot, e.g., <code>{&#8221;a.b&#8221;: &#8220;c&#8221;}</code>, our code would parse the string, create a nested JSON structure, e.g., <code>{&#8221;a&#8221;: {&#8221;b&#8221;: &#8220;c}}</code>and replace the original flattened key.</p><p>But this involved modifying the actual data the user was sending, and the performance issue was still unresolved. Given these drawbacks, we concluded that modifying the data shape within the OpenTelemetry collector was not a viable path forward.</p><p>And at the same time, ClickHouse announced a stable version of the JSON data type.</p><h2>Using ClickHouse&#8217;s native JSON data type</h2><p>With the introduction of a native <a href="https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse">JSON data type</a> in ClickHouse, we identified an opportunity to migrate from our string data type column to JSON. Adopting this also meant offloading all associated operations to ClickHouse, allowing us to leverage ClickHouse&#8217;s highly optimised, built-in functions for JSON traversal and data extraction.</p><p>But there was a limitation. Clickhouse&#8217;s native JSON type is built to handle dynamic paths of JSON keys. In order to do so, it needs them to be predictable. But log data from our customers is hardly predictable. It may contain any number of unique paths.</p><p>Before understanding how we overcame this limitation, let&#8217;s understand more about ClickHouse&#8217;s JSON data type.</p><h2>Inside the Working of ClickHouse JSON Type</h2><p>ClickHouse&#8217;s JSON data type allows you to store semi-structured JSON documents in a column while preserving efficient, columnar storage for individual JSON fields. Internally, JSON columns flatten nested JSON keys into subcolumns for query efficiency, as demonstrated below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N0C0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N0C0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 424w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 848w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1272w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png" width="986" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:986,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96131,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N0C0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 424w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 848w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1272w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Columnar storage in Clickhouse for JSON</figcaption></figure></div><p>To define a JSON column, you can provide optional settings like <code>max_dynamic_paths</code> in the column definition, which controls how ClickHouse handles <strong>dynamic paths</strong> [incoming JSON fields whose schema or structure is unknown].</p><p>Understanding this is crucial to the solution we finally designed.</p><h3>Understanding <code>max_dynamic_paths</code></h3><p>The setting <code>max_dynamic_path</code> , limits the number of distinct JSON <em>paths</em> it will treat as separate subcolumns for any given chunk of data. This limit is defined at the table&#8217;s column level, but it is <strong>enforced per data part;</strong> each chunk of stored data [or &#8220;part&#8221;]. By default, this value falls back to 1024.</p><p>But for our customer logs, we can not have this limit.</p><p>Sometimes the incoming data can have really high cardinality [sigh], which could lead to an <a href="https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#challenge-3-prevention-of-avalanche-of-column-data-files-on-disk">explosive avalanche</a> of paths exhausting the upper limit. When the threshold is surpassed, if another distinct key appears, that key&#8217;s values [and any other new keys] will be stored in a shared data structure.</p><p>The next question that naturally arises is, what is a <em>reasonable</em> value for this setting to get optimal performance? Let&#8217;s dive deeper.</p><p></p><h3><strong>What is a reasonable maximum for JSON&#8217;s </strong><code>max_dynamic_paths</code><strong>?</strong></h3><p>The <code>max_dynamic_paths</code> setting controls how many unique JSON paths can be promoted to dedicated subcolumns per data part. The <em>reasonable</em> maximum is far from straightforward; it depends heavily on your data&#8217;s shape and the storage backend.</p><p>In most high-cardinality systems, like observability or event analytics platforms, customer-generated data contains extremely diverse JSON keys. A single dataset might include fields like <code>order.id</code>, <code>order.user_id</code>, or <em>even arbitrary UUIDs</em> (yes, seriously!) nested deep in the JSON structure. In such cases, even if you raise <code>max_dynamic_paths</code> to thousands, it gets consumed quickly because every unique key or UUID becomes a new path. No number ever feels <em>enough</em> when users continuously send data with new identifiers baked into the keys.</p><p>But what if we set <code>max_dynamic_path = 0</code> and create columns for dynamic paths on demand.</p><p></p><h2>Building a Two-Tier JSON Storage Model</h2><p>By setting <code>max_dynamic_path = 0</code>, we stopped the creation of sub-columns for any JSON path. This meant that all the JSON data ingested is stored directly in the <em>shared data structure</em>, not as sub-columns.</p><p>This becomes our baseline. Now let&#8217;s talk about performance. The effect of the change on querying is better than what existed [storing logs as a string]. With the introduction of multiple <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#new-serializations-for-shared-data-in-v258">JSON serialisation formats</a> in ClickHouse, we faced yet another critical architectural decision &#8212;which format would deliver the best for our heavy workloads, especially for all the frequent <code>GROUP BY</code> queries? <br>Let&#8217;s examine that in greater detail.</p><h3>#1. Storing Data in Advanced Serialisation Format</h3><p>ClickHouse provides several serialisation formats for storing JSON data, including the Map type, bucketed maps, and the <em>advanced JSON format</em>. You can read more about these <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#new-serializations-for-shared-data-in-v258">formats here</a>.</p><p>We performed benchmarks on both <code>map</code> and <code>advanced</code> shared data structure and found there were some big wins for <code>advanced</code> shared data structure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tpuJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 424w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 848w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1272w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png" width="710" height="341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:341,&quot;width&quot;:710,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 424w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 848w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1272w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There were some big wins from opting for this format, which included better performance for aggregation and filtering of data; hence, operations like <code>GROUP BY</code> or <code>WHERE</code> clauses on specific JSON fields could be executed with high efficiency.</p><p>You can read more about advanced shared data structures <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#advanced-shared-data">here</a>.</p><p>This became <strong>Tier 1</strong> of our architecture.</p><h3>#2. Promoting Frequently Queried Paths</h3><p>While Tier 1 provides an efficient baseline for querying any JSON attribute, it is not optimised for fields that are accessed frequently. The overhead of checking metadata and decoding values becomes significant at scale for certain <em>hot</em> fields.</p><p>To address the performance challenges associated with querying large JSON objects in log data, we implemented <strong>Tier 2,</strong> designed to minimise query latency by separating frequently accessed fields from the larger, less-queried JSON blob.</p><p>The core of this optimisation is the use of two distinct JSON columns for storing log attributes:</p><ol><li><p><strong>Primary JSON Blob:</strong> A standard JSON column that serves as the default repository for all incoming log attributes, which was discussed in #1. This column accommodates the long tail of infrequently accessed fields.</p></li><li><p><strong>Secondary JSON Column:</strong> A second, specialised JSON column [promoted] is dedicated to storing key-value pairs that are frequently used in query filters, aggregations, and dashboards. This column is configured to leverage <strong>ClickHouse&#8217;s dynamic path settings (default of 1024)</strong>, which we had set to zero for the primary blob. For example, a path/key called <code>body.status_code</code> is frequently queried, then it becomes stored in our secondary or promoted column.</p></li></ol><p>This provides the expected performance with ClickHouse JSON columns, without compromising consistency in structure. But how does the system determine which fields are commonly queried? Let&#8217;s dissect that.</p><h3>#3. Selecting and Ingesting Promoted Fields</h3><p>Let&#8217;s think of it as a two-part process.</p><p>1/ If a user expects to increase performance over a certain key or path, it will be added to a separate table named <code>promoted_paths</code> , let&#8217;s call them as <em>hot fields</em> for now. Every 10 seconds, the ingestion service refreshes a cached list of these <em>hot</em> fields. If a new field gains prominence in queries, it is added to this cache list.</p><p>2/ During data ingestion, the ingestion service inspects each incoming log. If the log&#8217;s JSON payload contains keys that match the list of promoted fields in the cache list, those key-value pairs are extracted and moved into the secondary/ promoted column. To prevent data duplication and reduce storage overhead, these keys are simultaneously removed from the primary JSON blob.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Evlq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Evlq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 424w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 848w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png" width="1456" height="653" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:653,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:361105,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Evlq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 424w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 848w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Here&#8217;s the entire process of splitting promoted columns and primary colums during ingestion</figcaption></figure></div><p></p><h2>Comparing Results ~ 30% Faster, 100% Lighter</h2><p>We compared the performance of the two-tier JSON model with the older String Column on filtering and <code>group by</code> queries.</p><p>On testing the query performance with a 9TB dataset, we found that the JSON data type is 30% faster in execution time and scans around 99% less data, with a slightly higher memory usage.</p><p>Here are the stats for the comparison we did with different combinations of filters on both storage models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U1DF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U1DF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 424w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 848w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1272w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256271,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U1DF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 424w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 848w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1272w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Conclusion</h2><p>This entire technical optimisation would enable customers to enjoy an enriched experience and query seamlessly without worrying about the shape and form of their data. With the two-tier model, the challenges that plagued our old system were systematically eliminated. <em>Inconsistent JSON structures</em> are now gracefully handled, with hot fields promoted and the rest stored efficiently. The <em>slow string searches</em> that once took minutes are now sub-second queries on structured data.</p><p> If you want to try our new logging experience, you can reach out to<strong> <a href="mailto:cloud-support@signoz.io">cloud-support@signoz.io</a>.</strong></p><p>If you loved this engineering deep-dive, here are some similar ones:</p><ul><li><p><strong><a href="https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one">100 Github Releases. Yet it&#8217;s day one.</a></strong></p></li><li><p><strong><a href="https://newsletter.signoz.io/p/how-we-made-our-queries-995-faster">How we made our Queries 99.5% faster</a></strong></p></li><li><p><strong><a href="https://newsletter.signoz.io/p/enabling-a-million-spans-in-trace-details-page">Engineering a Trace Details Page That Handles a Million Spans</a></strong></p></li></ul><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Stay tuned for more deep technical content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><br></p>]]></content:encoded></item><item><title><![CDATA[Patterns for Deploying OTel Collector at Scale]]></title><description><![CDATA[As applications grow, the question quickly shifts from what OTel can do to how we can deploy it effectively at scale. In this post, we&#8217;ll explore some deployment patterns for the OTel Collector!]]></description><link>https://newsletter.signoz.io/p/patterns-for-deploying-otel-collector</link><guid isPermaLink="false">https://newsletter.signoz.io/p/patterns-for-deploying-otel-collector</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 30 Nov 2025 11:30:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8nS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128140; <em>Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is an honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at SigNoz are a bunch of observability fanatics obsessed with OpenTelemetry and open-source, and we reckon it&#8217;s important to share what we know. If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>On another note, feel free to check out our blogs and docs here. Our GitHub is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing Slack community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>So, you&#8217;ve embraced OpenTelemetry, and it&#8217;s been great.</p><p><em>Pat, Pat.</em></p><p>That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the <em>future is getting bigger</em>. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you&#8217;re wondering if that single collector instance is now a bottleneck waiting to happen.</p><p><em>You&#8217;re not alone</em>. As applications grow, the question quickly shifts from <em>what</em> OTel can do to <em>how</em> we can deploy it effectively at scale. In this post, we&#8217;ll explore some deployment patterns for the OpenTelemetry Collector, moving from a simple agent to a robust, multi-layered architecture. Let&#8217;s look at the three main deployment patterns for OTel collectors and break down how each trades off complexity, scalability, and isolation; thus, choosing the right one depends on your architecture and goals.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8nS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8nS4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 424w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 848w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1272w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" width="1240" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73917,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8nS4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 424w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 848w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1272w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>#1. Load-Balanced/ Gateway Pattern</h2><p>Instead of relying on a single, large OTel Collector, which you can also think of as a single point of failure &#128516;, this pattern uses <em>a fleet of identical, stateless collectors sitting behind a load balancer.</em> The idea is to distribute the incoming telemetry data across this fleet, so if any single collector instance fails, the others can seamlessly take over its workload.</p><h3>Architecture with the Load Balancer</h3><p>The data flows through a few distinct layers, as shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8gAf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8gAf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 424w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 848w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png" width="1456" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8gAf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 424w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 848w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture with Load Balancer</figcaption></figure></div><p><strong>Layer 1: Agents</strong></p><p>You still have OTel Collectors running as agents. These can be on individual hosts, as sidecars to your applications, or on a single Kubernetes node using a DaemonSet. The agent&#8217;s job is simply to collect data locally, batch it, and forward it to a single endpoint, the load balancer.</p><p><strong>Layer 2: The Load Balancer</strong></p><p>This is the central entry point for all telemetry data from your agents. It can be a cloud load balancer [like an AWS ELB/NLB or a GCP Load Balancer], or a self-hosted one like Nginx or HAProxy.</p><p>Its only job is to receive the data and distribute it across the fleet of gateway collectors using a strategy such as round-robin or a standard hashing algorithm.</p><p><strong>Layer 3: The Gateway Collector Fleet</strong></p><p>This is a group of two or more identical OTel Collector instances. They are the workhorses. Each collector in the fleet receives a fraction of the total data from the load balancer. They perform the heavy processing &#8212; advanced filtering, batching, retries, and exporting the data to one or more backends [e.g., SigNoz, Jaeger, etc.].</p><p></p><h3>Trade-offs &amp; Considerations</h3><p><strong>High Availability [HA]:</strong> If Collector 2 fails, the load balancer detects this and automatically redirects its traffic to Collector 1 and Collector 3. The pipeline remains up.</p><p><strong>Horizontal Scalability:</strong> If your data volume doubles, you don&#8217;t need to make your collectors twice as powerful [vertical scaling]. You can simply add more collectors to the fleet [horizontal scaling].</p><p><strong>Zero-Downtime Maintenance:</strong> You can perform rolling updates. Take one collector out of the load balancer&#8217;s pool, update it, and add it back. Repeat for the others without ever interrupting data flow.</p><p><strong>Complexity:</strong> This architecture introduces a new component, the load balancer, which must also be configured, managed, and monitored.</p><p><strong>Stateful Processors:</strong> This pattern is ideal for stateless processing. If you use OTel processors that rely on seeing all data for a given entity [e.g., the spanmetrics processor, which needs all spans for a trace], simply spraying data randomly can lead to incorrect results.</p><p>In such cases, you may need to configure your load balancer for &#8220;stickiness&#8221; or use a more advanced collector routing mechanism to ensure related data is routed to the same instance.</p><p></p><h2>#2. Multi-cluster/ Central Control-Plane Pattern</h2><p>Using a simple deployment strategy across many Kubernetes clusters is causing growing problems. It becomes hard to maintain consistent configurations and control your data with global rules.</p><p>Managing each cluster separately also creates security risks by storing credentials across multiple systems. At the same time, costs increase as each cluster sends data over expensive networks. The multi-cluster pattern fixes this by creating a central pipeline, making your data management secure, cost-effective, and easier to control.</p><h3><strong>The Multi-Stage Architecture</strong></h3><p>This pattern typically involves at least two layers of collectors, creating a <em>collect and forward</em> chain.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_uqj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_uqj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 424w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 848w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png" width="1456" height="938" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:938,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_uqj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 424w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 848w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The multi-stage architecture</figcaption></figure></div><p></p><p><strong>Layer 1: In-Cluster Collection [The Agent Layer]</strong></p><p>Inside <em>each</em> of your Kubernetes clusters, you run a local OTel deployment. This usually consists of a DaemonSet of collectors acting as <strong>agents</strong> [one per node] that scrape local data. These agents then forward their data to a small in-cluster gateway [a Deployment within the same cluster].</p><p>The primary role of this layer is to collect all data within its own cluster, add cluster-specific metadata [e.g., <code>cluster.name: prod-us-east-1</code>], and forward it to the next stage.</p><p><strong>Layer 2: Regional Aggregation [The Gateway Layer]</strong></p><p>This layer involves a central, highly available fleet of OTel Collectors to serve an entire region or logical environment [e.g., all US-East production clusters]. This regional gateway receives data from the in-cluster gateways of all the clusters it manages.</p><p>This is also where you can centralise your logic. The regional gateway handles:</p><ul><li><p>Authenticating with the final observability backends.</p></li><li><p>Enforcing global sampling rules.</p></li><li><p>Enriching data with region-level metadata.</p></li><li><p>Routing data to different backends based on type or team.</p><p></p></li></ul><h3>Trade-offs &amp; Consideration</h3><p><strong>Enhanced Security:</strong> Only the regional gateways need the secrets to connect to your final backends. The collectors inside your many clusters do not significantly reduce your security footprint.</p><p><strong>Centralised Management:</strong> You can manage your primary configuration [export destinations, sampling, etc.] in one place [the regional gateway] rather than in dozens. This makes updates and policy changes simple and consistent.</p><p><strong>Sizing:</strong> Each layer of the pipeline must be sized and scaled appropriately to handle the data volume from the layer below it.</p><p><strong>Network Paths:</strong> Ensure reliable, secure network connectivity between your clusters and the regional gateway.</p><p></p><h2>#3. Per Signal Pattern</h2><p>This pattern involves creating separate, parallel pipelines for each telemetry signal type, i.e, instead of a single, unified OTel Collector fleet that processes all signals together, you deploy specialised fleets &#8212; one for traces, one for metrics, and one for logs.</p><h3>Architecture with Agents &amp; Routing</h3><p>The OTel agents are configured to collect all signals as usual. At the first possible stage [either in the agent itself or in a simple first-layer gateway], the data is split. The OTel Collector&#8217;s routing processor is often used here.</p><ul><li><p>All traces are routed to the <em>Trace Gateway</em> fleet.</p></li><li><p>All metrics are routed to the Metrics Gateway fleet.</p></li><li><p>All logs are routed to the <em>Logs Gateway</em> fleet.</p></li></ul><p>Each gateway fleet is configured and optimised only for its specific signal type, with its own set of processors, and exports to its corresponding observability backend like SigNoz.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o6SG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o6SG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 424w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 848w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1100975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o6SG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 424w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 848w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Per-Signal Architecture</figcaption></figure></div><h3>Trade-Offs &amp; Consideration</h3><p><strong>Independent Scalability:</strong> You can scale your logging fleet to handle huge volumes without over-provisioning your tracing or metrics pipelines.</p><p><strong>Resource Optimisation:</strong> You can use CPU-optimised instances for your log collectors and memory-optimised instances for trace collectors, depending on load and necessity.</p><p><strong>Higher Operational Overhead:</strong> You are now managing three or more separate collector fleets, each with its own configuration, deployment pipeline, and monitoring. Might <em>get tiring</em>!</p><p><strong>Signal Correlation:</strong> It becomes more difficult to correlate signals at the collector level [e.g., using the spanmetrics processor to generate metrics from traces], as the data is already on separate paths.</p><p></p><h2>How To Choose the Right Deployment?</h2><p>The short answer is that there is <em>no hard-and-fast rule</em> for what is <em>right</em>. But we have put out a small guide that can help you understand some potential options you can explore.</p><p>If you have many clusters or regions that need unified telemetry, use the <em>multi-cluster</em> [control-plane] pattern. Designate one cluster as the central collector host, and configure each cluster&#8217;s agent/sidecar to export to it. This way, you get consistent processing [e.g. cross-cluster tail sampling] at the cost of cross-cluster links.</p><p><em>OR</em> <strong>I</strong>f different teams or customers must be isolated for privacy or regulatory compliance reasons (which are now getting stricter!), use a <em>multi-tenant</em> pipeline strategy. For example, tag data by team and have the collector route it to separate backends or processing paths. This limits the blast radius; one team/tenant&#8217;s misconfiguration won&#8217;t contaminate another&#8217;s data.</p><p><em>OR</em></p><p>When you need maximum ingestion throughput and uptime, deploy a load-balanced collector. Put collectors behind a robust <a href="https://www.haproxy.com/glossary/what-is-layer-7-load-balancing">L7 load balancer</a> so you can autoscale instances on demand. This handles bursts by spreading the load and avoiding any single Collector becoming a bottleneck.</p><p><em>OR</em></p><p>If your metrics/trace/log volumes differ greatly, consider splitting pipelines by <a href="http://signal.Like">signal.</a> As we mentioned above, run one collector for metrics [allowing many scraper replicas] and another for traces [optimised for tail sampling]. This lets you scale each pipeline to its workload without interference.</p><p><em>OR</em></p><p>For small deployments or strict budget constraints, start with a single Collector or node-level agents/sidecars to minimise infrastructure costs. As load grows or performance demands rise, move to more complex patterns: for example, add a gateway layer or switch to a load-balanced, multi-instance setup. Conversely, if ultra-low latency and resilience are paramount, an agent-and-gateway hybrid [per-node agents forwarding to central gateways] offers local buffering and global control.</p><p></p><h2>Words of Wisdom from the Field</h2><p>Here are some snippets with Sreekanth Chekuri, who is a Senior Software Engineer at SigNoz and also a contributor to OpenTelemetry. We hope some of these pointers will help guide you when designing the architecture for deploying your OTel collectors!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZE6e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 424w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 848w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2037520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 424w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 848w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Maximising Performance and Throughput</h3><ul><li><p><strong>Embrace Batching for Efficiency:</strong> He points out that a simple change like batching your data [e.g., up to 25k entries] significantly improves throughput. This works because it reduces unnecessary system calls and processing overhead, letting your pipeline work smarter.</p></li><li><p><strong>The Power of Resource Scrutiny:</strong> Remember that <em>resource requirements aren&#8217;t static</em>. If your collector is doing heavy data transformation like parsing complex logs or extracting attributes, it will naturally need more CPU and memory. Always size your Collector based on the processing load, not just the ingestion rate.</p></li></ul><p></p><h3>Strategic Collector Deployment</h3><ul><li><p><strong>Split by Signal for Precision:</strong> For optimal resource allocation, Sreekanth  recommends running <em>separate collectors for different signals</em>. This allows you to allocate memory and CPU precisely where needed, avoiding a single resource hog.</p></li><li><p><strong>Handle Traces with Care:</strong> Be mindful that <strong>tail-based sampling for traces</strong> is memory-intensive and requires specialised handling. If you mix this heavy operation with standard log or metric processing, it can impact the reliability of your entire system. Splitting these signals solves that problem.</p></li></ul><p></p><h3>Cautions</h3><ul><li><p><strong>Know Your Tools:</strong> While alternative data pipelines exist, he cautions against simply swapping out the OpenTelemetry Collector for tools like Vector. You risk losing the many powerful, built-in capabilities and standardised features that the OTel ecosystem provides.</p></li><li><p><strong>Watch Out for CPU Hogs:</strong> Some OTel processors, such as the transform processor, can be highly CPU-intensive. Use them judiciously, as they can significantly impact performance and scalability if overused in a high-throughput environment.</p></li></ul><p></p><div><hr></div><p><em>Thanks to <strong><a href="https://www.linkedin.com/in/jpkroehling/">Juraci</a></strong> for suggesting some edits to the initial version of this blog! Also, for his <strong><a href="https://github.com/jpkrohling/opentelemetry-collector-deployment-patterns">GitHub repository</a></strong>, which acted as a guidepost while I was learning about various patterns.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Stay tuned for more cool technical content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What is eBPF & What Does it Mean for Observability?]]></title><description><![CDATA[Decoding the buzz behind eBPF!]]></description><link>https://newsletter.signoz.io/p/what-is-ebpf-and-what-does-it-mean</link><guid isPermaLink="false">https://newsletter.signoz.io/p/what-is-ebpf-and-what-does-it-mean</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 22 Nov 2025 13:34:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mj95!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div><hr></div><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>On another note, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>eBPF is kind of like <em>matcha -</em> it has been around for a long time, yet it&#8217;s only within the past couple of years that it emerged as one of the latest trends and buzzwords in the industry.</p><p>I can&#8217;t explain how <em>matcha</em> became the world&#8217;s most popular drink (maybe another time &#128521;), but I will take today&#8217;s blog as an opportunity to tell you how eBPF has become a big deal for <em>revolutionising observability at the kernel level</em>, among many other dope stuff. Let&#8217;s look at the history of eBPF, how it works, what problems it solves, and why you &#8211; yes, <em>you!</em> &#8211; should start taking advantage of it today.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mj95!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mj95!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 424w, https://substackcdn.com/image/fetch/$s_!mj95!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 848w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1272w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" width="708" height="499.5043536503684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3160,&quot;width&quot;:4479,&quot;resizeWidth&quot;:708,&quot;bytes&quot;:982065,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mj95!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 424w, https://substackcdn.com/image/fetch/$s_!mj95!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 848w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1272w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>What is eBPF?</h2><p>eBPF - or the <em>extended</em> Berkeley Packet Filter, as it was formally known - is the name of a kernel execution engine that runs a variety of new programs in a performant and safe sandbox in the kernel.</p><p>If the above definition flew right past your head, let me simplify it. It&#8217;s almost like putting JavaScript into the Linux kernel. JavaScript can run programs safely in a browser sandbox similar to eBPF in a kernel.</p><p>With eBPF, developers can execute custom programs [typically in a restricted C syntax] and load them at runtime in kernel space without the need to modify kernel source code or add additional modules.</p><p>Originally derived from the classic BPF used for packet filtering, eBPF greatly extends its scope beyond networking to any part of the system. Since eBPF has evolved <em>way</em> beyond packet filtering, it&#8217;s almost an understatement to refer to it as &#8220;extended&#8221;, and the acronym is not in active use anymore.</p><p>If you are interested in the evolution of eBPF, ideas and thoughts in the early days, take a look at the documentary below. This is also a great example of all the work that went behind the scenes to get code merged in a large codebase like Linux.</p><div id="youtube2-Wb_vD3XZYOA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Wb_vD3XZYOA&quot;,&quot;startTime&quot;:&quot;3s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Wb_vD3XZYOA?start=3s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><h2>How does eBPF work?</h2><p>By now, we have established that eBPF is a crazy technology. What happens BTS of how eBPF programs function is even more mind-blowing. Let me take a moment to explain it in-depth.</p><h3>Step 1: Write an eBPF Program</h3><p>Everything starts with writing the logic you want the kernel to execute. This is typically done in a restricted, C-like language. It&#8217;s not full C. For example, you can&#8217;t have unbounded loops or call just any function you want. The goal is to create a small, efficient piece of code that is guaranteed to run quickly and safely. Instead of calling standard libraries, eBPF programs use a special set of <em>helper functions</em> provided by the kernel to interact with the system, such as getting the current process ID or looking at network packet data.</p><h3>Step 2: Compilation to Bytecode</h3><p>Once the C code is written, it&#8217;s compiled into eBPF bytecode using a toolchain like <strong>Clang/LLVM</strong>. This bytecode is a universal, platform-independent instruction set that the Linux kernel can understand. This is similar to how Java code is compiled into bytecode to run on the Java Virtual Machine (JVM). In this case, the <em>virtual machine</em> is a secure one that lives inside the Linux kernel itself. The output is typically an ELF file containing the bytecode and definitions for any maps the program will use.</p><h3><strong>Step 3: Load the Program and Create Maps</strong></h3><p>This step is handled by a <strong>user-space application</strong>. This is a normal program you write in a language like Go, Rust, or Python that acts as the controller for your eBPF code. This application performs two key tasks:</p><ul><li><p>It reads the eBPF bytecode from the file created in Step 2.</p></li><li><p>It uses a special system call (bpf()) to load that bytecode into the kernel.</p></li></ul><p>At this stage, the user-space application also creates any <strong>eBPF maps</strong> the program needs. These maps are the crucial bridge for communication. They are key-value data structures that can be accessed by both the eBPF program in the kernel and the user-space application.</p><h3>Step 4: <strong>Verification and JIT Compilation</strong></h3><p>This is the most critical step for ensuring safety and performance. As soon as the kernel receives the eBPF bytecode, it passes it to the <strong>Verifier</strong>. The verifier performs a static analysis of the code to prove that <em>it is safe to run</em>. It checks for infinite loops, out-of-bounds memory access, and illegal instructions. If the program fails verification, it is immediately rejected.</p><p>If the program passes verification, the kernel then uses a <em>Just-In-Time (JIT) compiler</em> to translate the eBPF bytecode into native machine code for the host CPU. This means the code doesn&#8217;t have to be interpreted, allowing it to run at nearly the same speed as natively compiled kernel code.</p><h3><strong>Step 5: Attach and Execute</strong></h3><p>After being loaded and verified, the eBPF program is in the kernel but is not yet active. The user-space application must explicitly attach it to a specific event hook. This could be:</p><ul><li><p>A network interface, to inspect incoming/outgoing packets [XDP or TC hooks].</p></li><li><p>A system call entry/exit point [a tracepoint].</p></li><li><p>The entry or exit of a function in the kernel or a user-space application [kprobe or uprobe].</p></li></ul><p>Once attached, the kernel will automatically trigger the eBPF program every time that event occurs [Yes, eBPF is event-driven!]. The program runs, performs its task [like updating a counter in an eBPF map], and exits all within the kernel context, making it incredibly fast. Meanwhile, the user-space application can periodically read from the eBPF map to collect the data and present it to the user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TuJ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 424w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 848w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png" width="1299" height="1064" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1299,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263283,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 424w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 848w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">BTS of how eBPF programs run</figcaption></figure></div><h2>eBPF for Observability</h2><p>Let&#8217;s understand how eBPF could be used for observability by checking how it provides visibility into each of the three pillars.</p><h3>Metrics</h3><p>eBPF can be used to collect highly granular performance metrics that are impossible to see from the user space. For example, you can attach an eBPF program to kernel functions to precisely measure TCP retransmits, disk I/O latency, or time spent scheduling processes.</p><h3>Logs</h3><p>While not a replacement for traditional application logs, eBPF can generate highly contextual <em>event logs</em>. For example, you can create a log every time a process opens a sensitive file, writes to a specific socket, or executes a new program, complete with the process ID and user ID. This provides a powerful audit trail for security and debugging.</p><h3>Traces</h3><p>eBPF can automatically trace requests between services without any code changes. By observing the send() and recv() system calls made by applications, eBPF-powered tools can stitch together a distributed trace, even across different programming languages. It can even trace encrypted traffic [like HTTPS] by hooking into the application&#8217;s memory <em>before</em> the data is encrypted.</p><h2>Tracing File Opens with eBPF</h2><p>Let&#8217;s put the above theory into practice. Here&#8217;s a small example of how we can count the number of file opens with eBPF. We are controlling the eBPF program via Python. Since we are using the BCC [BPF Compiler Collection] framework, which is a popular Python library for writing and loading eBPF programs, we should have it installed.</p><p>Here&#8217;s the program/ script for the eBPF program that traces the <code>openat()</code> syscall, and logs the process ID, process name, and file path each time a file is opened.</p><pre><code><code>from bcc import BPF

# eBPF program that hooks into the openat syscall
bpf_code = &#8220;&#8221;&#8220;
#include &lt;uapi/linux/ptrace.h&gt;
#include &lt;linux/sched.h&gt;

struct data_t {
    u32 pid;
    char comm[TASK_COMM_LEN];
    char fname[256];
};

BPF_PERF_OUTPUT(events);
int trace_openat(struct pt_regs *ctx, int dfd, const char __user *filename, int flags) {
    struct data_t data = {};

    // Capture process ID and name
    data.pid = bpf_get_current_pid_tgid() &gt;&gt; 32;
    bpf_get_current_comm(&amp;data.comm, sizeof(data.comm));

    // Capture file name
    bpf_probe_read_user(&amp;data.fname, sizeof(data.fname), filename);

    // Send the data to user-space
    events.perf_submit(ctx, &amp;data, sizeof(data));
    return 0;
}
&#8220;&#8221;&#8220;

# Load the eBPF program
b = BPF(text=bpf_code)

# Attach eBPF program to the openat syscall
b.attach_kprobe(event=&#8221;sys_openat&#8221;, fn_name=&#8221;trace_openat&#8221;)

# Function to print the output
def print_event(cpu, data, size):
    event = b[&#8221;events&#8221;].event(data)
    print(f&#8221;PID: {event.pid}, Process: {event.comm.decode(&#8217;utf-8&#8217;)}, File: {event.fname.decode(&#8217;utf-8&#8217;, &#8216;replace&#8217;)}&#8221;)

# Open a perf buffer to receive events from kernel space
b[&#8221;events&#8221;].open_perf_buffer(print_event)

# Continuously listen for events and print them
while True:
    b.perf_buffer_poll()
</code></code></pre><p>Execute the script with root privileges, as eBPF requires them to load programs into the kernel.</p><pre><code><code>sudo python3 &lt;name _of_file&gt;
</code></code></pre><p>Let&#8217;s break down the code into its two main parts.</p><p></p><h3>The eBPF Program [The C Code]</h3><p>This is the logic that runs securely inside the kernel.</p><ul><li><p><code>struct data_t</code>: We first define a C struct. This is the <em>shape</em> of the data we want to send from the kernel to our Python program. In our example, it holds the process ID, the command name, and the filename.</p></li><li><p><code>BPF_PERF_OUTPUT(events)</code> : This is a BCC macro that creates a high-performance communication channel called events. It allows us to efficiently send data from the kernel to user space without slowing the system down.</p></li><li><p><code>int trace_open(struct pt_regs *ctx)</code>: This is our main eBPF function. It gets the current process ID [pid] and command name [comm] using eBPF helper functions [bpf_get_current_pid_tgid() and bpf_get_current_comm()].</p></li><li><p>The most important part is <code>bpf_probe_read_user_str()</code>. The filename exists in the memory of the application making the system call, not in the kernel. This special helper function safely copies the filename string from the user&#8217;s application memory into our <code>data. filename</code> variable.</p></li><li><p>Finally, <code>events.perf_submit()</code> pushes our completed data structure into the events perf buffer, making it available to our Python script.</p><p></p></li></ul><h3>The User-Space Controller [The Python Code]</h3><p>This Python script loads and manages the eBPF program.</p><ul><li><p><code>b = BPF(text=bpf_program)</code>: This line is where the BCC magic happens. It takes our C code as a string, compiles it into eBPF bytecode, and loads it into the kernel. The kernel&#8217;s Verifie<strong>r</strong> checks the bytecode to ensure it&#8217;s safe before allowing it to be loaded.</p></li><li><p><code>b.attach_kprobe(...)</code>: This is the crucial step where we <em>attach</em> our trace_open C function to a kernel event. We use a kprobe [kernel probe] to hook into the kernel function that handles the openat system call. Now, every time any process on the system calls openat, our eBPF code will run first.</p></li><li><p><code>b[&#8221;events&#8221;].open_perf_buffer(print_event</code><strong>)</strong>: This tells our script to start listening to the events channel we created in the C code. For every piece of data that comes through, it will call our Python function print_event.</p></li><li><p><code>while True: b.perf_buffer_poll()</code>: This is the main event loop. The script sits here, efficiently waiting for data to arrive from the kernel. When data is available, it triggers the print_event callback to print the formatted output to your screen.</p></li></ul><p>Once you run the script with root privileges, you will see output like this,</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DEaP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DEaP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 424w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 848w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1272w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png" width="1422" height="174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:174,&quot;width&quot;:1422,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46255,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DEaP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 424w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 848w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1272w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Although this is a very basic example, it gives a good insight into how eBPF programs work from code to monitoring calls. eBPF is no longer a niche technology, but something that is being widely adopted by orgs at various levels, revolutionising the tech industry &#8212; one <em>matcha</em> at a time. &#127861;</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Subscribe to read more awesome tech stuff!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[100 GitHub Releases. Yet it's day one 😊]]></title><description><![CDATA[Here&#8217;s to the next 100. We&#8217;re just getting started!]]></description><link>https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one</link><guid isPermaLink="false">https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one</guid><dc:creator><![CDATA[Anushka Karmakar]]></dc:creator><pubDate>Sun, 16 Nov 2025 11:31:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7J0t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>This piece is written by <a href="https://www.linkedin.com/in/anushkakarmakar/">Anushka</a>, PMM at SigNoz, on account of completion of 100 releases at SigNoz. </em></p><p><em>Also, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p><p><em>Cheers.</em></p></blockquote><p></p><p>We just shipped our <strong><a href="https://github.com/SigNoz/signoz/releases/tag/v0.100.0">100th GitHub release</a></strong>.</p><p>You would think a milestone like this would feel like an arrival, a moment to look back and say, &#8220;Yay, we made it.&#8221;</p><p>But when I sat down with the team to understand how they felt, everyone said some version of the same thing - &#8220;We are just getting started.&#8221;</p><p>To understand what it actually takes and feels like to ship 100 releases, I talked to a few of my teammates from different junctures of the product story.</p><p>This is their story. Welcome to our journey of 100 releases.</p><blockquote><p><em>A quick side note on methodology: My highly scientific process for selecting interviewees involved grabbing anyone who wasn&#8217;t in a meeting at the eleventh hour. While their stories are amazing, they are just a few of the many that make up our 100-release journey. A huge thank you to the entire team!</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7J0t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7J0t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 424w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 848w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1272w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" width="1200" height="739" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:739,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/178980032?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7J0t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 424w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 848w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1272w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>The first hire</strong></h2><p><strong><a href="https://www.linkedin.com/in/ankit-anand-686a53a1/">Ashu</a></strong> was the third person at SigNoz, joining the two founders in May 2021 as a Growth Manager. His job was to let more developers know that SigNoz existed and that it was OpenTelemetry native. When he joined, SigNoz had around 600-700 GitHub stars.</p><p>The problem was that nobody was paying attention yet. Our Slack community had crickets. But Ashu believed in the founders and their genuine care to improve the experience of fellow developers. This was his push to keep going.</p><p>He kept writing content on dev.to, publishing on Hacker News, and explaining how to implement OpenTelemetry in Java applications when OpenTelemetry itself was brand new.</p><p>This soon became a habit. It eventually built up and gave us numbers to chase.</p><p>We&#8217;d trend on GitHub occasionally, articles would go viral, and when people visited the repo, they&#8217;d see the value and star it. The momentum kept attracting people. Soon, contributors started trickling in. Later, customers became contributors too.</p><p>Today, SigNoz has <strong><a href="https://github.com/SigNoz/signoz/">24,200+ stars</a></strong> on GitHub (as of 5th Nov, 2025).</p><h2><strong>The first backend engineer</strong></h2><p>While Ashu was getting the word out, <strong><a href="https://www.linkedin.com/in/makeavish/">Vishal</a></strong> joined us in November 2021 as our first backend engineer, starting on the traces module. For us, OpenTelemetry was never a mere technology choice; it was in our DNA from day one. We learned from the OpenTelemetry community, raised issues upstream, and helped debug.</p><p>Community was always core to what we were building.</p><p>As the product gained traction, users started asking for a SaaS version. They didn&#8217;t want the hassle of setting up and maintaining the open-source infrastructure. The team decided to launch SigNoz Cloud during a workation in Goa in December 2023, with the idea of just doing a one-week soft launch to test for issues and then rolling it back. In the first week, we got four signups. We never rolled it back.</p><p>Vishal remembers the chaos of that launch vividly. The team was at the beach when Nitya got a message about the sign-up flow being broken and him running back (literally) to the room to fix it. That workation launch, which they thought was just a pilot, became the real thing.</p><p>That&#8217;s when Vishal&#8217;s role shifted. He went from being a backend engineer to a product manager, spending his days on calls with customers, debugging issues, and doing manual setups. The work wasn&#8217;t glamorous. There were sleepless nights and pressure from large companies to prioritize their features.</p><p>But that messy, tedious work had to get done. Customers like <strong><a href="https://signoz.io/case-study/kiwi/">Kiwi</a></strong> helped shape the product through their feedback and pull requests, pushing us to build for actual scale, not just theoretical scale.</p><h2><strong>Bringing logs to the product</strong></h2><p>When <strong><a href="https://github.com/nityanandagohain">Nitya</a></strong> joined in April 2022, we had around 40-50 weekly active users. His first task was to bring logs into the product. We had traces and metrics, but logs were the missing piece.</p><p>Just a month after he joined, during a team workation in May 2022, a conversation over lunch set the tone for how we build. Pranay, CEO of SigNoz, asked Nitya what he was working on.</p><p>&#8220;Logs. Building out the schema,&#8221; Nitya said.</p><p>&#8220;How much data are you testing on?&#8221;</p><p>&#8220;One million.&#8221;</p><p>&#8220;No, no. Test it on one billion.&#8221;</p><p>Nitya spent the next week running schemas against a billion log lines, again and again. That ambition became our standard. When we released the <strong><a href="https://signoz.io/blog/logs-performance-benchmark/">first logs benchmark</a></strong>, it caught fire on Hacker News. The traffic exploded, and with it, more customers and more data.</p><p>But as customers grew, so did the complexity. Nitya manually migrated hundreds of customers to new schemas over six months. We couldn&#8217;t afford to lose their data or break their workflows.</p><p>What worked at 50 users didn&#8217;t work at 500. Every new customer taught us something, and every incident made us more careful.</p><h2><strong>The first frontend lead</strong></h2><p>We had a functional product, but it was far from pretty. That&#8217;s where <strong><a href="https://github.com/YounixM">Yunus</a></strong> came in, joining in August 2023 as our first frontend engineer.</p><p>Frontend engineers typically don&#8217;t gravitate toward dev tools, which are often built for SREs and backend folks. But Yunus wanted to build a culture where engineers think beyond their specific skillset.</p><p>His philosophy was simple: &#8220;You are not a frontend engineer or a backend engineer. You are a software engineer.&#8221; He wanted everyone to understand the &#8216;why&#8217; before getting into the &#8216;how.&#8217;</p><p>This thinking was crucial.</p><p>When Yunus joined, we were moving fast, but we weren&#8217;t always thinking in systems. He focused on building processes to make our frontend more stable and predictable because people make mistakes, but good systems can prevent those mistakes from breaking things.</p><h2><strong>Joined at the 34th release</strong></h2><p><strong><a href="https://github.com/vikrantgupta25">Vikrant</a></strong> joined as a frontend engineer in January 2024. The frontend wasn&#8217;t stable; fixing one thing often broke another. He felt disconnected from the full picture, but an opportunity came up that changed everything. The provisioning flow for new sign-ups was constantly breaking, and Vikrant had already expressed interest in learning backend.</p><p>So he made the shift. He had a one-month crash course, taking over ownership from <strong><a href="https://github.com/therealpandey">Pandey</a></strong>. It was a ticking bomb. Either he figured it out, or the sign-up flow stayed broken.</p><p>And damn, he did figure it out.</p><p>Later, the community asked for something ambitious - <strong><a href="https://signoz.io/blog/traces-without-limits/">loading traces with millions of spans</a></strong>. We didn&#8217;t want to build a makeshift solution. We wanted to solve it permanently. After two months of intense work, we could load a million spans on a single screen.</p><p>Then came the push for provisioning v1.0. Our deadline was the Tuesday platform retrospective at 6:30 PM. As the clock ticked, the call got pushed to 7:00, then 7:15, then 7:30. We finally deployed to production, tested it, and joined the call right after. The entire team stood (well, practically sat in front of their laptops) together until v1.0 was stable.</p><p>The next challenge was to improve our SQL database schemas. For three months, we had to break down our entire infrastructure and rebuild it. Every Wednesday noon, Vikrant, Pandey, and Nitya would send each other memes, about the known pattern - a release would go out, and by 12:30 PM, a bug would be reported. Every single time.</p><p>But as Vikrant puts it, &#8220;If you&#8217;re tired, do it tired.&#8221; And when you have the back of your team, it does get easy.</p><h2><strong>Building processes that scale</strong></h2><p><strong><a href="https://github.com/therealpandey">Pandey</a></strong> joined in February 2024, when SigNoz was just a bunch of people executing. There were no pods, no real structure. His first task was to stabilize data ingestion. Every other day, a customer complained they couldn&#8217;t ingest data. It was a race against time.</p><p>After stabilizing ingestion, he turned to a bigger question - What does it take to run a high-performing team?</p><p>He laid the foundation for the platform pod, starting with just him and Vikrant. They implemented sprints, retrospectives, and reporting - a process that ran for eight months with just the two of them before being adopted company wide.</p><p>Introducing structure wasn&#8217;t easy. It led to internal friction and disagreements on how things should be done.</p><p>But as Pandey notes, the culture being set today is what new people will embody. That&#8217;s how the baton gets passed.</p><h2><strong>Eight months in</strong></h2><p><strong><a href="https://github.com/piyushsingariya">Piyush</a></strong> joined in March 2025, and eight months later, he says it already &#8220;feels like forever&#8221; in the best way. At previous companies, the push was to ship features fast, any way possible. Here, he found the time and space to do it the right way the first time.</p><p>Working with Nitya on logs, he had to learn a new way of collaborating remotely. It took a few weeks to align on the thinking behind testing certain things, but over time, the context builds.</p><p>Now, Piyush is in the position Nitya was in when he first joined. He is responsible for making logs better and working on complex features like cloud integrations. He&#8217;s also exploring JSON logs, which he believes will boom fast and make many data pipelines redundant.</p><p>And well, his story connects everything. Someone is still at their Day 1, even on our 100th release.</p><h2><strong>Day 1, again</strong></h2><p>Every Wednesday, I look forward to the release. And it&#8217;s not just because I write the changelog. It&#8217;s become my favorite part of the week. As a marketer, I couldn&#8217;t ask for a better way to stay connected to what we&#8217;re actually building.</p><p>Yet here I am, writing a nostalgic story instead of a technical, feature-tracking blog. Because at the end of the day, there are humans building these sophisticated features, and their stories are worth hearing. at least sometimes, if not often.</p><p>The Day 1 feeling isn&#8217;t restricted to engineering. A few days ago, we launched our first-ever mascot. We ran our first integrated campaign. Every one of these feels like a beginning.</p><p>It&#8217;s Ashu pushing through the crickets. Vishal doing the messy, unglamorous work. Nitya testing for a billion when a million seemed like enough. Yunus building systems, Vikrant doing it tired, and Pandey introducing structure when velocity felt more important.</p><p>These stories are limited to a few, but they echo the team&#8217;s sentiment at large.</p><p>Here&#8217;s to the next 100. We&#8217;re just getting started!</p><div><hr></div><p>This spirit extends beyond our internal team. Our community has been with us every step of the way, which is why this past July, we were thrilled to launch the <strong><a href="https://signoz.io/blog/community-advocate-program/">SigNoz Community Advocate Program</a></strong>. It&#8217;s our way of recognizing the passionate developers who help others succeed with observability.</p><p>Shout-out to our inaugural advocates for their incredible contributions: <strong><a href="https://github.com/mgilham">Mathew Gilham</a></strong>, <strong><a href="https://github.com/MattiDeGrauwe">Matti De Grauwe</a></strong>, <strong><a href="https://github.com/KieranP">Kieran Pilkington</a></strong>, <strong><a href="https://github.com/gfelot">Gil Felot</a></strong> and <strong><a href="https://github.com/nlamirault">Nicolas Lamirault</a></strong>.</p><p>And in a moment of perfect serendipity, just as I was about to create the PR for this post, we welcomed our 500th paid customer.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[ELI5 Auth Model for OpenTelemetry Collector]]></title><description><![CDATA[In modern systems, where even a small mishap can wreak havoc and you might wake up to a $$$ bill the next day, we should do whatever is within our capacity to secure our systems.]]></description><link>https://newsletter.signoz.io/p/eli5-auth-model-for-opentelemetry</link><guid isPermaLink="false">https://newsletter.signoz.io/p/eli5-auth-model-for-opentelemetry</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 26 Oct 2025 12:02:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BUjn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz! </em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at&nbsp;<strong><a href="https://signoz.io/">SigNoz</a></strong>&nbsp;are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>On another note, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">&nbsp;Slack</a></strong>&nbsp;community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>In any type of software that involves the movement of data&nbsp;<em>or&nbsp;</em>information, there is a pressing need to make the passage of data secure. One way of achieving this is by <em>authentication</em>. You must have experience authenticating API calls or other data streams. </p><p>Gemini defines authentication as <em>the process of verifying that a user, device, or system is who or what it claims to be, typically by using credentials like a username and password<strong>. </strong></em>When I was first learning about authenticating systems, I related it to the term&nbsp;<em>authenticity, which is&nbsp;</em>closely related to<em>&nbsp;trustworthiness, </em>that is, can the source of incoming data or request be&nbsp;<em>trusted</em>&nbsp;enough to accept it<em>?  </em>You can stick with a definition or build an idea based on what works best for you. :)</p><p>In modern systems, where even a small mishap can wreak havoc and you might wake up to a $$$ bill the next day, we should do whatever is within our capacity to secure our systems.  </p><p>That&#8217;s why this week, I want to talk about something crucial but often overlooked: <em>Authentication for your OpenTelemetry Collectors</em>. These collectors are the busy data hubs of your observability pipeline, handling huge amounts of information every moment. Securing them is non-negotiable, and also a perfect use case for strong authentication.</p><p></p><h2>Authentication in OpenTelemetry</h2><p>Firstly, OpenTelemetry on its own doesn&#8217;t define an authentication protocol or an auth model. OpenTelemetry's primary aim was to define a standard data model (like for <a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/">metrics</a> and <a href="https://opentelemetry.io/docs/specs/otel/logs/data-model/">logs</a>) and a transport protocol (<a href="https://opentelemetry.io/docs/specs/otel/protocol/">OTLP</a>).  It leaves us the flexibility to work with any authentication scheme, based on our collector pipeline and the backend we are using. </p><p>In a Collector pipeline, data has one point of entry, the <em>receivers</em> and one point of exit, the <em>exporters</em>. Authentication is critical at both of these points.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BUjn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BUjn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 424w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 848w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1272w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" width="1328" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:1328,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:720648,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/176322568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BUjn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 424w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 848w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1272w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Authenticating incoming and outgoing traffic</figcaption></figure></div><p></p><h3>Authenticating Incoming Traffic</h3><p>As we saw before, the receiver is the point of entry for data traffic, hence it&#8217;s crucial to examine if the data is coming from a <em>reliable</em> source. We achieve this by auth extensions. You can read more about<strong> <a href="https://opentelemetry.io/docs/collector/configuration/#extensions">extensions</a></strong> here. </p><p>In this scenario, we will configure our Collector to only accept requests that include a valid secret token in their Authorization: Bearer &lt;token&gt; header. This is a three-step process in your Collector&#8217;s config.yaml file.</p><p></p><h4><strong>Step 1: Define the Authenticator in extensions</strong></h4><p>First, we define our authentication method. We&#8217;ll use the built-in bearerauth authentication and provide it with a list of valid tokens.</p><pre><code>extensions:
   bearerauth:
   # This defines a list of valid secret tokens the collector will accept.
   # Any client request must present one of these tokens to be authenticated.
       tokens:
          &#8220;${CLIENT_A_TOKEN}&#8221;
          &#8220;${CLIENT_B_TOKEN}&#8221;

</code></pre><p>Just registering the authentication here under the extension doesn&#8217;t <em>enforce</em> it. It gets enforced when it&#8217;s applied to a receiver, as shown in the next section.</p><div><hr></div><p><strong>&#9888;&#65039; Important Security Note!</strong></p><p>Never hardcode secrets directly in your configuration file. The ${...} syntax tells the Collector to load the token from an environment variable. You should inject these variables securely using a tool like Kubernetes Secrets or Docker Secrets.</p><div><hr></div><h4><strong>Step 2: Apply the Authenticator to a Receiver</strong></h4><p>Next, we tell our otlpreceiver that it must use the authenticator we just defined. We do this by adding an auth setting within the receiver&#8217;s configuration.</p><pre><code>receivers:
  otlp:
    protocols:
      grpc:
        endpoint: &#8220;0.0.0.0:4317&#8221;
        auth:
          authenticator: bearerauth   # use the bearerauth extension
      http:
        endpoint: &#8220;0.0.0.0:4318&#8221;
        auth:
          authenticator: bearerauth   # same auth on HTTP</code></pre><p></p><h4><strong>Step 3: Enable the Extension in the Service Block</strong></h4><p>Finally, the extension must be activated for the Collector by listing it in the service section. This is the entire flow of code.</p><pre><code>service:
  extensions: [bearerauth]  # This activates the bearerauth extension
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]  
</code></pre><p>With this configuration in place, your Collector&#8217;s incoming traffic is now secure. Any request arriving at the OTLP receiver without a valid token will be rejected, ensuring only your trusted applications can send data into your observability pipeline.</p><p>There are other ways of authentication as well, like &#8212; basicauth, oidc, etc, depending on your particular use case. Now, let&#8217;s see how we deal with outgoing traffic. </p><p></p><h3>Securing Outgoing Traffic</h3><p>Exporters are the exit point for data leaving the collector. The next destination for your data is most likely an observability backend like SigNoz, and the collector often needs to authenticate itself to prove it has permission to send that data. Now, there are two ways to do this. </p><p>The easiest way is to add a headers section directly to your exporter&#8217;s configuration in your config.yaml. This tells the exporter to attach the specified headers (containing your secret key) to every outgoing request. The code is shown below,</p><pre><code>exporters:
  otlp:
    endpoint: &#8220;ingest.us.signoz.cloud:443&#8221;
    headers:
      # This header authenticates the Collector with the SigNoz backend
      signoz-ingestion-key: &#8220;${SIGNOZ_API_KEY}&#8221; # as env var
</code></pre><p>For more complex authentication, you can follow the same sequence of steps as we did for receivers above. That is, Step 1 - Define the Authenticator in Extensions, AND Step 2: Apply the Authenticator to an Exporter. At the end, we register the extension under exporters. Here&#8217;s the entire code sample.</p><pre><code>extensions:
  sigv4auth: ## a specialized authenticator for users of AWS.
    region: &#8220;us-east-1&#8221;
    service: &#8220;aoss&#8221;  

exporters:
  otlp:
    endpoint: &#8220;ingest.us.signoz.cloud:443&#8221;
    headers:
      signoz-ingestion-key: &#8220;${SIGNOZ_API_KEY}&#8221;

  otlphttp/aws:
    endpoint: &#8220;https://my-opensearch-domain.us-east1.aoss.amazonaws.com&#8221;
    auth:
      authenticator: sigv4auth


 service:
   extensions: [sigv4auth]  
   pipelines:
     traces:
       receivers: [otlp]
       processors: [batch]
       exporters: [otlp]  
</code></pre><p><br>In summary, for most backends that use a simple API key, the static headers setting is all you need. For more complex scenarios involving cloud provider IAM roles or OAuth2, we use Collector&#8217;s auth extensions.</p><p></p><h3>What&#8217;s next?</h3><p>Now that we&#8217;ve laid a foundation for securing data flowing into your OpenTelemetry collectors, you can get hands-on and experiment with different authentication methods to get a well-rounded idea. To read more on OpenTelemetry collectors and their various parts, this is a good <strong><a href="https://signoz.io/blog/opentelemetry-operator-complete-guide/">read</a></strong><a href="https://signoz.io/blog/opentelemetry-operator-complete-guide/">.</a>    </p><p>Next week, I&#8217;ll be back with another deep-dive, and until then, adeiu! &#128075;</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you enjoyed reading this, stay tuned and subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p>]]></content:encoded></item><item><title><![CDATA[LLM Observability in the Wild - Why OpenTelemetry should be the Standard ]]></title><description><![CDATA[Building, debugging, and improving AI agents in production gets messy fast. So, what's the solution? Read on!]]></description><link>https://newsletter.signoz.io/p/llm-observability-in-the-wild-why</link><guid isPermaLink="false">https://newsletter.signoz.io/p/llm-observability-in-the-wild-why</guid><dc:creator><![CDATA[Pranay]]></dc:creator><pubDate>Sun, 12 Oct 2025 13:02:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/DPL35sYPGPU" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days ago I hosted a live conversation with Pranav, co-founder of Chatwoot, about issues his team was running into with LLM observability.</p><p>The short version: building, debugging, and improving AI agents in production gets messy fast. There&#8217;s multiple competing standards for default libraries for LLM observability. And many such libraries like OpenInference which claim to be based on OpenTelemetry don&#8217;t strictly adhere to it&#8217;s conventions. This introduces problems for users who are trying to get better observability across their stack.</p><p>Here&#8217;s a write-up of what we covered and what I think it means for anyone shipping LLM features into real products. Feel free to watch the complete video</p><div id="youtube2-DPL35sYPGPU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;DPL35sYPGPU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/DPL35sYPGPU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>The Problem Emerges in Prod</strong></h2><p>Pranav and I go way back to our YC days in 2021, and it&#8217;s always interesting to see how our paths have evolved. Chatwoot has built something really compelling - an open-source customer support platform that unifies conversations across every channel you can imagine: live chat, email, WhatsApp, social media, you name it. All in a single dashboard.</p><p>But here&#8217;s where it gets interesting. They&#8217;ve built an AI agent called &#8220;Captain&#8221; that can work across all these channels. You build the logic once, and it can handle support queries whether they come through email, live chat, or WhatsApp. Pretty neat, right?</p><p>The problem started showing up in production in the most unexpected ways. Sometimes their AI would randomly respond in Spanish when it absolutely shouldn&#8217;t. Other times, responses just weren&#8217;t quite right, and they had no visibility into <em>why</em>.</p><h2><strong>The Quest for LLM Observability</strong></h2><p>This is where Pranav&#8217;s journey into LLM observability began, it mirrors what I&#8217;ve been seeing across many companies building LLM applications. You need to understand:</p><ul><li><p>What documents were retrieved for a RAG query?</p></li><li><p>Which tool calls were made?</p></li><li><p>What was the exact input and output at each step?</p></li><li><p>Why did the AI make certain decisions?</p></li></ul><p>Without this visibility, you&#8217;re essentially flying blind in production.</p><h2><strong>The Standards Problem</strong></h2><p>Here&#8217;s where things get really interesting, and frankly, frustrating. Pranav explored several solutions:</p><p><strong>OpenAI&#8217;s native tracing</strong> looked promising with rich, detailed traces showing guardrails, agent flows, and tool calls. But it&#8217;s tightly coupled to OpenAI&#8217;s agent framework. Also, it only provides traces as an atomic unit. If you want to filter spans based on attributes or just examine specific spans directly, you can&#8217;t do that.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!olOR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!olOR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 424w, https://substackcdn.com/image/fetch/$s_!olOR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 848w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1272w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp" width="1456" height="787" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:787,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenAI agent workflow traces&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenAI agent workflow traces" title="OpenAI agent workflow traces" srcset="https://substackcdn.com/image/fetch/$s_!olOR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 424w, https://substackcdn.com/image/fetch/$s_!olOR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 848w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1272w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>OpenAI agent workflow traces</em></figcaption></figure></div><p><strong>New Relic</strong> was easy to integrate since they already use it, and it supports OpenTelemetry. But the UI required clicking through 5-6 layers just to see relevant information. Not ideal when you&#8217;re trying to debug production issues.</p><p><strong>Phoenix</strong> caught their attention because it follows the OpenInference standard, which provides much richer, AI-specific span types. You can easily filter for just LLM calls, tool calls, or agent spans. The traces are beautiful and informative.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H1Fj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 424w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 848w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1272w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Phoenix doesn't recognize OpenTelemetry span kinds&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Phoenix doesn't recognize OpenTelemetry span kinds" title="Phoenix doesn't recognize OpenTelemetry span kinds" srcset="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 424w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 848w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1272w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Phoenix doesn&#8217;t recognize OpenTelemetry span kinds</em></figcaption></figure></div><p>But here&#8217;s the kicker: Chatwoot is primarily a Ruby on Rails shop, and guess what? No Ruby SDK for OpenInference. Moreover, Phoenix doesn&#8217;t completely adhere to OTel semantic conventions, so if you send it telemetry data directly via OpenTelemetry, it doesn&#8217;t recognize the type of spans, etc.</p><p>As shown in the example above, Phoenix doesn&#8217;t shows data sent with OpenTelemetry span kinds as <code>unknown</code>.</p><h2><strong>The OpenTelemetry vs OpenInference Divide</strong></h2><p>This is where the conversation got really technical and revealed a fundamental industry problem. There are essentially two standards emerging:</p><p><strong>OpenTelemetry</strong> is the industry standard. It has libraries for every language, it&#8217;s production-ready, and it&#8217;s widely adopted. But it was built for traditional applications, not AI workflows. It only supports basic span types: internal, server, client, producer, consumer. That&#8217;s it.</p><p><strong>OpenInference</strong> was created specifically for AI applications. It has rich span types like LLM, tool, chain, embedding, agent, etc. You can easily query for &#8220;show me all the LLM calls&#8221; or &#8220;what were all the tool executions.&#8221; But it&#8217;s newer, has limited language support, and isn&#8217;t as widely adopted.</p><p>The tragic part? OpenInference claims to be &#8220;OpenTelemetry compatible,&#8221; but as Pranav discovered, that compatibility is shallow. You can send OpenTelemetry format data to Phoenix, but it doesn&#8217;t recognize the AI-specific semantics and just shows everything as &#8220;unknown&#8221; spans.</p><h2><strong>The Ruby Problem Makes It Worse</strong></h2><p>For teams using languages like Ruby that don&#8217;t have direct OpenInference SDK support, this becomes even more challenging. Pranav had to choose between:</p><ol><li><p>Building an SDK from scratch for Ruby</p></li><li><p>Using OpenTelemetry and losing AI-specific insights</p></li><li><p>Switching to a different language stack just for AI observability (way tougher)</p></li></ol><p>None of these are great options.</p><h2><strong>Why we (still) bias to OpenTelemetry</strong></h2><p>At SigNoz we&#8217;re all-in on OpenTelemetry. One reason: OTel&#8217;s consistency enables out-of-the-box experiences across your <em>whole</em> stack. Example: we can auto-surface <strong><a href="https://signoz.io/docs/external-api-monitoring/overview/">external API</a></strong> usage and performance based on span kinds and attributes. When parts of the app send telemetry via non-OTel conventions, those views degrade.</p><p>Chatwoot lands similarly: their entire product already emits OTel. Pulling in a second telemetry standard just for LLMs fragments the picture and complicates how they go about observability. This also silos their observability into different products which makes it difficult to solves issues when they occur.</p><h2><strong>Takeaways for builders</strong></h2><ul><li><p><strong>Pick one telemetry backbone</strong> - If most of your app is OTel, prefer staying OTel-native for LLMs too, even if it means adding richer attributes until GenAI conventions catch up.</p></li><li><p><strong>LLM specific libraries</strong> - Even if you have to use LLM specific libraries like OpenInference, try to keep your usage as close to OpenTelemetry as possible so that you are aware what non-OTel attributes you are using which may break things.</p></li><li><p><strong>Follow OTel GenAI working group</strong> - There is active work happening in OTel <strong><a href="https://opentelemetry.io/blog/2024/otel-generative-ai/">Gen AI working group</a></strong>. Follow the work happening there and do share your use cases so that the standards which OpenTelemetry builds are able to cater to most common use cases.</p></li></ul><p>As the LLM space is still evolving rapidly, we as a community need to share our voices so that the standards are robust.</p><div><hr></div><h2><strong>What we&#8217;re doing at SigNoz</strong></h2><p>We&#8217;re continuing to invest in OpenTelemetry-native LLM observability so teams don&#8217;t have to choose between stability and clarity. Concretely, that means:</p><ul><li><p>Clear dashboards and traces when LLM calls are modeled using OTel spans/attributes. You can find examples and dashboards in our <strong><a href="https://signoz.io/docs/llm-observability/">LLM observability</a></strong> docs. Though we have also use LLM specific libraries like OpenInference in our docs (as they are still the easiest way for ppl to get started), we have kept the dashboards as close to OTel standards as possible. We also plan to actively update this as OTel GenAI semantic conventions become more mature.</p></li><li><p>Guidance and examples for popular frameworks (LangChain, LlamaIndex, etc.) on emitting OTel-friendly telemetry.</p></li><li><p>Build features leveraging OpenTelemetry semantic conventions so that you get great out-of-box experience in SigNoz and adhere to thoughtful defaults that keep your services, DBs, queues, and LLM agents&#8212;in one coherent picture.</p></li></ul><p>If you&#8217;re wrestling with these trade-offs, we&#8217;d love to hear what&#8217;s breaking for you and what &#8220;rich semantics&#8221; you actually use day-to-day.</p><div><hr></div><h2><strong>What next?</strong></h2><p>Huge thanks to Pranav for going deep, especially from the Ruby perspective. If you&#8217;re shipping AI features and care about operability, add your voice: push for richer GenAI semantics in OpenTelemetry, and share real traces (sanitized) that show what you need to see.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk!  If you want more interesting reads, feel free to subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Query Builder v5 - Two Years of Technical Debt, 80 Closed Issues, and a Fundamental Rethinking]]></title><description><![CDATA[Read on to understand how we revamped our query builder!]]></description><link>https://newsletter.signoz.io/p/query-builder-v5-two-years-of-technical</link><guid isPermaLink="false">https://newsletter.signoz.io/p/query-builder-v5-two-years-of-technical</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 21 Sep 2025 13:40:09 GMT</pubDate><content:encoded><![CDATA[<p>In 2022, we had three different query interfaces. Logs had a custom search syntax with no autocomplete. Traces only had predefined filters - no query builder at all. Metrics had a raw PromQL input box where you'd paste queries from somewhere else and hope they worked.</p><p>Each system spoke a different language. An engineer debugging a production issue had to context-switch not just between data types, but between entirely different ways of thinking about queries.</p><p>When we built v3 in 2022, we thought we were solving this. We created a unified query builder - basically a UI wrapper around SQL. Count, group by, filter, limit. It worked well enough to get us from 2022 to 2024.</p><p>Turns out we were building with the wrong assumptions.</p><h2><strong>The v3/v4 Design Flaw That Took Two Years to Understand</strong></h2><p>We designed v3 around traces and metrics. In these data types, you rarely need complex boolean logic. Simple AND between conditions usually covers it.</p><p>But logs are different. When you're searching logs during an incident, you need expressions like:</p><pre><code><code>(node_name contains 'management' OR pod_name contains 'test')
AND NOT (status_code &gt;= 500)
</code></code></pre><p>v3 couldn't do this. No OR support. No complex boolean expressions. No parentheses for precedence.</p><p>This was a major limitation that blocked common use cases. Users were forced to learn ClickHouse SQL, write raw queries, and maintain them as our schemas evolved. We'd built a query builder that couldn't handle the queries users actually needed.</p><h2><strong>The Support Calls That Changed Our Philosophy</strong></h2><p>After four years of support calls, we noticed a pattern that surprised us.</p><p>Senior engineers - people with 5-10 years of experience - couldn't find features that seemed obvious to us. Take chronological ordering in logs. We had the feature, buried three clicks deep in v3 and v4. Users didn't just struggle to use it; they assumed we didn't support it at all.</p><p>During these calls, we'd watch them search for features, see their frustration, and realize: if you built it and know exactly where it is, everything seems obvious. But if senior engineers can't discover your features, those features don't exist.</p><p>For v5, we changed our approach. We decided to stop making decisions for users.</p><p>In v3/v4, we tried to be clever. We'd make assumptions about what users wanted, hide complexity to "simplify" the experience. These assumptions were often wrong and led to behavior that broke trust.</p><p>For v5, we set a new rule: if we must make a decision, it should be the least surprising one possible. And wherever possible, don't make the decision at all - let users control their experience.</p><h2><strong>The Architectural Reality: You Can't Ship a Query Builder in Isolation</strong></h2><p>When we started building v5, we quickly discovered that the query builder isn't just one component. It's how users interact with data across the entire product.</p><p>Think about the typical workflow: You write a query in the explorer to investigate an issue. Then you either:</p><ul><li><p>Save it as a dashboard panel to monitor the pattern</p></li><li><p>Create an alert to catch it next time</p></li><li><p>Switch between logs, traces, and metrics to correlate data</p></li></ul><p>This interconnection meant we couldn't ship v5 for just the explorer. A query written in the new format had to work everywhere. This forced us to rebuild:</p><ul><li><p>All three explorers (logs, traces, metrics)</p></li><li><p>Dashboard panel creation (including value panels that only exist in dashboards)</p></li><li><p>Alert creation flows</p></li><li><p>The underlying query API that powers all of these</p></li></ul><p>What started as "let's add OR support to the query builder" became a complete architectural overhaul.</p><h2><strong>The Technical Implementation</strong></h2><h3><strong>Full-Text Search That Works Like Google</strong></h3><p>The most common use case during an incident is that a user sends you an error message. In v3, you'd need to construct a query with the correct syntax. In v5, you just paste and search:</p><pre><code><code>"connection timeout in payment service"
</code></code></pre><p>Behind the scenes, we parse this into the appropriate query structure. But the user doesn't need to know that. They're debugging a problem, not learning a query language.</p><h3><strong>Complex Boolean Logic with Proper Precedence</strong></h3><p>The feature that was impossible in v3/v4 and forced users to write ClickHouse queries:</p><pre><code><code>(service_name = 'api' AND status_code &gt;= 500)
OR
(service_name = 'worker' AND error_message contains 'timeout')
</code></code></pre><p>This seems basic, but implementing it required rethinking our entire query structure. We needed to support arbitrary nesting, maintain precedence rules, and still provide autocomplete and suggestions at every level.</p><h3><strong>Cross-Source Query Portability</strong></h3><p>Queries are portable across data types. It&#8217;s one of the most powerful features that users don&#8217;t notice initially.</p><p>Write a query filtering for <code>service_name = 'api'</code> in logs. Copy it. Paste it in traces explorer. It works.</p><p>This seems simple, but the implementation is complex. Logs, traces, and metrics have:</p><ul><li><p>Different underlying table schemas</p></li><li><p>Different column names for similar concepts</p></li><li><p>Different valid operations</p></li></ul><p>We built an abstraction layer that translates queries between these contexts automatically. Users think in terms of their data, not our storage schema.</p><h3><strong>Performance at Scale: Instant Suggestions</strong></h3><p>When you're typing a query, you need suggestions immediately. But we're dealing with:</p><ul><li><p>Millions of unique field values</p></li><li><p>Multiple data sources</p></li><li><p>Complex hierarchical data structures</p></li></ul><p>We implemented:</p><ul><li><p>Smart caching that predicts what fields you'll query next</p></li><li><p>Progressive loading that shows the most relevant suggestions first</p></li><li><p>Query optimization that happens before we send anything to ClickHouse</p></li></ul><p>The result? An autocomplete that feels instant, even at scale.</p><h2><strong>The UX Debt We Finally Paid</strong></h2><p>Because we were touching every part of the query experience, we could finally address years of accumulated UX issues.</p><p><strong>Chronological ordering in logs:</strong> Moved from a hidden dropdown to a prominent toggle. Same capability, much better discoverability.</p><p><strong>Time aggregation controls:</strong> Previously buried in advanced settings, now directly visible. Users can switch from 1-minute to 5-second granularity with one click.</p><p><strong>Interval selection:</strong> Direct control over data granularity from 5 seconds to 1 hour. Why does this matter? During an incident, 30-second aggregation might smooth out the spike that's causing your problem. 5-second aggregation shows you exactly when things went wrong.</p><p>These weren't query builder features, but fixing them was essential to delivering a coherent experience. When engineers are debugging production issues at 2 AM, they shouldn't hunt for basic controls.</p><h2><strong>The Validation: Users Replacing ClickHouse Queries</strong></h2><p>We shipped v5 with a single changelog entry. No marketing campaign. No push to adopt it.</p><p>Within three weeks, the feedback started coming in. The one that stood out: a user telling us they'd replaced all their ClickHouse queries with Query Builder queries.</p><p>We didn't ask them to do this. They discovered that the query builder could now handle their complex cases, and they preferred it over raw SQL.</p><p>Why? Because with Query Builder:</p><ul><li><p>They don't need to learn ClickHouse SQL syntax</p></li><li><p>They don't need to update queries when we change schemas</p></li><li><p>They get autocomplete and validation</p></li><li><p>They can copy queries between different data types</p></li><li><p>They can share queries with team members who don't know SQL</p></li></ul><p>When users actively choose your abstraction over direct database access, you know you've built the right thing.</p><h2><strong>What We Couldn't Ship Yet: The Future of Cross-Signal Correlation</strong></h2><h3><strong>Subqueries: Correlating Across Signal Types</strong></h3><p>Imagine investigating an incident where you see 500 errors. Your hypothesis: high CPU usage caused the failures. Today, you check traces for errors, then separately check metrics for CPU usage, then try to mentally correlate the timings.</p><p>With subqueries (currently in development), you'll write:</p><pre><code><code>Show traces where:
status_code &gt;= 500
AND subquery(metrics: CPU_usage &gt; 80% for same service)

</code></code></pre><p>This requires real-time joining of traces and metrics data. The architecture is designed, the UI patterns are established. Implementation is next.</p><h3><strong>Cross-Source Joins: Unified Debugging Experience</strong></h3><p>Currently, logs and traces live in separate worlds. You can see that a trace has an error, and you can see related logs, but you can't query them together.</p><p>With joins (in design phase), you'll write:</p><pre><code><code>Show logs where:
JOIN traces ON trace_id
WHERE traces.duration &gt; 500ms

</code></code></pre><p>This unlocks debugging workflows that are impossible today. Find all logs related to slow traces. Show logs where the parent span had an error. Correlate log patterns with trace characteristics.</p><h2><strong>The Engineering Lesson: Technical Elegance Without Discoverability Is Worthless</strong></h2><p>After four years working on this product, countless support calls, and watching experienced engineers struggle with features I thought were obvious, the lesson is clear:</p><p>Your technical solution can be elegant. Your features can be powerful. But if users can't find and use them, they might as well not exist.</p><p>We could have the most sophisticated query engine in the world. But if an engineer investigating a production incident can't immediately figure out how to use it, we've failed.</p><p>Query Builder v5 isn't just about adding OR support or fixing bugs. It's about recognizing that during an incident, engineers shouldn't have to think about query syntax. They should think about their problem.</p><h2><strong>Where We Go From Here</strong></h2><p>We closed 80 issues with v5. We have 50+ more in the backlog.</p><p>But we're not planning a v6 mega-release. We designed v5's architecture to be extensible. The abstractions are correct. The patterns are established. Now we can ship incremental improvements without breaking changes.</p><p>Subqueries, joins, and the remaining enhancements will roll out as they're ready. No more two-year gaps between major improvements.</p><p>The query builder is no longer just a UI component. It's how engineers interact with their observability data. And for the first time, it's powerful enough that users are choosing it over writing raw SQL.</p><p>That's not just a technical achievement. That's validation that we finally understood the problem we were trying to solve.</p><p>Query Builder v5 is live in the latest release. <strong><a href="https://signoz.io/docs/userguide/query-builder-v5/">Check the documentation</a></strong> for detailed examples and capabilities.</p>]]></content:encoded></item><item><title><![CDATA[LangChain Observability: How to Monitor LLM Apps with OpenTelemetry (With Demo App)]]></title><description><![CDATA[In this practical guide, we will walk you through setting up observability for your Langchain application with OpenTelemetry.]]></description><link>https://newsletter.signoz.io/p/langchain-observability-how-to-monitor</link><guid isPermaLink="false">https://newsletter.signoz.io/p/langchain-observability-how-to-monitor</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 07 Sep 2025 14:02:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!15qZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>LangChain has become one of the most popular frameworks for building LLM-powered applications, making it easier to create agents that can reason, plan, and take actions. But like any production-grade AI app, LangChain agents can run into performance bottlenecks, hallucinations, or tool call failures. And without proper LangChain observability, it&#8217;s hard to know where things break down.</p><p>In this practical guide, we will walk you through setting up observability for your Langchain application with OpenTelemetry</p><p>, the open-source standard for generating telemetry data. We'll instrument a demo trip planner agent and show you how to visualize traces, token usage, and tool performance in SigNoz.</p><p>The trip planner agent helps users plan their travel itinerary by combining LLM reasoning with external services like flight ticket search, weather APIs, hotel booking engines, and nearby activity recommendations. By instrumenting it with OpenTelemetry, you can trace every step of the planning process, measure latency at each stage, and quickly debug issues that impact the user experience.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RzQN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RzQN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RzQN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Why LangChain Observability Matters</strong></h2><p>LangChain agents are essentially reasoning loops: the LLM takes user input, decides which tools to call, processes their results, and iterates until it arrives at an answer. In a trip planner agent app, this might look like:</p><ul><li><p>Calling a flights API to check availability.</p></li><li><p>Fetching hotel recommendations from a booking API.</p></li><li><p>Looking up weather forecasts to suggest the best travel window.</p></li><li><p>Stitching everything together into a coherent itinerary.</p></li></ul><p>This chain of reasoning is amazing when it works but if one tool call fails, takes too long, or returns garbage, the whole experience collapses. Without observability, you won&#8217;t know whether the problem was:</p><ul><li><p>A slow external API call.</p></li><li><p>An LLM misunderstanding the tool response.</p></li><li><p>The reasoning loop going in circles.</p></li></ul><p>Instrumentation with OpenTelemetry makes all of this visible from under the hood.</p><h2><strong>How OpenTelemetry and SigNoz can help</strong></h2><p><strong>What is OpenTelemetry?</strong></p><p><strong><a href="https://signoz.io/blog/what-is-opentelemetry/">OpenTelemetry</a></strong> (OTel) is an open-source observability framework that provides a unified way to collect telemetry data&#8212;traces, metrics, and logs&#8212;from across your application stack. It&#8217;s a CNCF project with support for multiple programming languages and a wide range of integrations. The beauty of OTel is that you instrument your code once, and you can send that data to any observability backend you choose without vendor lock-in.</p><p>For LangChain-based agents, this means you can capture detailed performance and error data for each stage of the reasoning process: LLM calls, tool invocations (like flights, hotels, weather, and activity search), and the orchestration logic that stitches them together. Instead of treating your agent as a black box, you get fine-grained visibility into exactly how requests flow through your system.</p><p><strong>What is SigNoz?</strong></p><p><strong><a href="https://signoz.io/">SigNoz</a></strong> is an all-in-one observability platform built on top of OpenTelemetry. It provides a rich UI to visualize traces, monitor performance metrics, and set alerts all in real time. With SigNoz, you can drill into slow external API calls, trace a single trip planning request end-to-end, or quickly identify where your LangChain agent might be looping or failing.</p><p>By pairing OpenTelemetry&#8217;s standardized data collection with SigNoz&#8217;s powerful analysis tools, you get a complete observability stack tailored for modern, distributed, and AI-driven applications.</p><p>To demonstrate how OpenTelemetry and SigNoz work together in practice, we&#8217;ll walk through a demo trip planner agent built on LangChain. The agent uses flight search, hotel booking, weather APIs, and nearby activity lookup to craft travel itineraries, and with observability enabled, you can see every step of the process in action.</p><p></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2><strong>Building the Example App: A LangChain Trip Planner Agent</strong></h2><p>To make this guide more concrete, we&#8217;ve built a trip planner agent powered by LangChain, OpenTelemetry, and SigNoz. The idea is simple: the user specifies a start location, destination, and check-in/check-out dates, and the agent generates a personalized travel itinerary.</p><p>The itinerary includes:</p><ul><li><p><strong>Flight details</strong> for departure and return.</p></li><li><p><strong>Hotel booking options</strong> covering the entire stay.</p></li><li><p><strong>Weather forecasts</strong> for the chosen dates.</p></li><li><p><strong>Nearby activities</strong> to explore at the destination.</p></li></ul><p>Under the hood, the app uses LangChain&#8217;s agent framework to orchestrate multiple tool calls: one for flight tickets, one for hotels, one for weather, and one for activities. The LLM reasons over the responses from these tools and stitches them together into a coherent itinerary.</p><p>With OpenTelemetry instrumentation baked in, every tool invocation and LLM call is traced and sent to SigNoz, providing a complete picture of the app&#8217;s performance and behavior: whether a flight API is slow, a hotel lookup fails, or the agent loops unnecessarily.</p><p>To make it more interactive, the trip planner also includes a chatbot feature. Users can ask follow-up questions like <em>&#8220;Can you find vegetarian-friendly restaurants near my hotel?&#8221;</em> or <em>&#8220;What&#8217;s the best day trip outside the city?&#8221;</em> These conversations are also traced, helping developers see how the agent performs during exploratory dialogue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EFLN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EFLN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 424w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 848w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1272w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp" width="1334" height="1166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1166,&quot;width&quot;:1334,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent App Image" title="Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!EFLN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 424w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 848w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1272w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain App Starting Page</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6XGQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 424w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 848w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1272w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp" width="1456" height="1259" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1259,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent App Chat&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent App Chat" title="Agent App Chat" srcset="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 424w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 848w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1272w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain App Interactions</em></figcaption></figure></div><h2><strong>Try the Trip Planner Agent Yourself</strong></h2><p>Want to explore the LangChain Trip Planner in action? Clone the <strong><a href="https://github.com/SigNoz/langchain-monitoring-demo/tree/main">repo</a></strong>, install dependencies, and follow the setup steps in the README to start sending traces from your local app to SigNoz.</p><pre><code><code>git clone https://github.com/SigNoz/langchain-monitoring-demo.git
</code></code></pre><p>After cloning the repo, you can run the agent locally and start exploring and creatign travel plans. The <strong><a href="https://github.com/SigNoz/langchain-monitoring-demo/blob/main/README.md">README</a></strong> provides step&#8209;by&#8209;step guidance for setting up the demo. If you&#8217;d rather instrument your own LangChain app, continue to the next section for detailed instructions on integrating OpenTelemetry and SigNoz.</p><h2><strong>Instrument your LangChain application</strong></h2><h3><strong>Prerequisites</strong></h3><ul><li><p>A Python application using <strong>Python 3.8+</strong></p></li><li><p>LangChain integrated into your app</p></li><li><p>Basic understanding of AI Agents and tool calling workflow</p></li><li><p>A <strong><a href="https://signoz.io/teams/">SigNoz Cloud account</a></strong> with an active ingestion key</p></li><li><p><code>pip</code> installed for managing Python packages</p></li><li><p>Internet access to send telemetry data to SigNoz Cloud</p></li><li><p><em>(Optional but recommended)</em> A Python virtual environment to isolate dependencies</p></li></ul><p>To capture detailed telemetry from LangChain without modifying your core application logic, we will use <strong><a href="https://arize.com/docs/ax/learn/tracing-concepts/what-is-openinference">OpenInference</a></strong>, a community-driven standard designed to make observability in AI applications easier. It provides pre-built instrumentation for popular frameworks like LangChain, and it&#8217;s built on top of the trusted OpenTelemetry ecosystem. This allows you to trace your LangChain application with minimal configuration.</p><p>Check out detailed instructions on how to set up OpenInference instrumentation in your LangChain application over <strong><a href="https://pypi.org/project/openinference-instrumentation-langchain/">here</a></strong>.</p><p><strong>Step 1:</strong> Install OpenInference and OpenTelemetry related packages</p><pre><code><code>pip install openinference-instrumentation-langchain \
opentelemetry-exporter-otlp \
opentelemetry-sdk
</code></code></pre><p><strong>Step 2:</strong> Import the necessary modules in your Python application</p><pre><code><code>from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.langchain import LangChainInstrumentor
</code></code></pre><p><strong>Step 3:</strong> Set up the OpenTelemetry Tracer Provider to send traces directly to SigNoz Cloud</p><pre><code><code>resource = Resource.create({"service.name": "&lt;service_name&gt;"})
provider = TracerProvider(resource=resource)
span_exporter = OTLPSpanExporter(
    endpoint="https://ingest.&lt;region&gt;.signoz.cloud:443/v1/traces",
    headers={"signoz-ingestion-key": "&lt;your-ingestion-key&gt;"},
)
provider.add_span_processor(BatchSpanProcessor(span_exporter))
</code></code></pre><ul><li><p><code>&lt;service_name&gt;</code> is the name of your service</p></li><li><p>Set the <code>&lt;region&gt;</code> to match your SigNoz Cloud <strong><a href="https://signoz.io/docs/ingestion/signoz-cloud/overview/#endpoint">region</a></strong></p></li><li><p>Replace <code>&lt;your-ingestion-key&gt;</code> with your SigNoz <strong><a href="https://signoz.io/docs/ingestion/signoz-cloud/keys/">ingestion key</a></strong></p></li></ul><p><strong>Step 4:</strong> Instrument LangChain using OpenInference</p><p>Use the <code>LangChainInstrumentor</code> from OpenInference to automatically trace LangChain operations with your OpenTelemetry setup:</p><pre><code><code>LangChainInstrumentor().instrument()
</code></code></pre><blockquote><p><em><strong>&#128204; Important: Place this code at the start of your application logic &#8212; before any LangChain functions are called or used &#8212; to ensure telemetry is correctly captured.</strong></em></p></blockquote><p>Your LangChain commands should now automatically emit traces, spans, and attributes.</p><p>Finally, you should be able to view this data in Signoz Cloud under the traces tab:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wFuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wFuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 424w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 848w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1272w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp" width="1456" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Traces View&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Traces View" title="Traces View" srcset="https://substackcdn.com/image/fetch/$s_!wFuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 424w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 848w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1272w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Traces of your LangChain Application</em></figcaption></figure></div><p>When you click on a trace ID in SigNoz, you'll see a detailed view of the trace, including all associated spans, along with their events and attributes:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iZLL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iZLL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 424w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 848w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp" width="1456" height="767" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:767,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Detailed Traces View&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Detailed Traces View" title="Detailed Traces View" srcset="https://substackcdn.com/image/fetch/$s_!iZLL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 424w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 848w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Detailed traces view of your LangChain Application</em></figcaption></figure></div><h2><strong>Making Sense of Your Telemetry Data</strong></h2><p>Once telemetry is enabled in our LangChain trip planner agent, we start seeing detailed traces for each major step in the reasoning workflow. With LangGraph integration, these traces are neatly structured, showing how the agent loop orchestrates model calls and tool invocations. Here are three example spans you&#8217;ll encounter:</p><p><strong>LangGraph (root span)</strong></p><p>The overarching span represents the full request lifecycle of the trip planner agent. From the moment a user asks for a travel itinerary, every downstream operation: LLM reasoning, tool calls, and response generation is captured inside this parent span.</p><p>This view makes it clear how long the entire request took. On the right panel, you can explore input values like the initial user query, making it easy to trace back how the request was interpreted at the start.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bmtl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Root Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Root Span" title="Root Span" srcset="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Langraph Root Span</em></figcaption></figure></div><p><strong>Agent span</strong></p><p>Nested inside the LangGraph span is the agent span, which captures the LLM&#8217;s reasoning steps. This includes the decision-making process: when to call a tool, how to interpret the results, and whether the loop should continue or terminate.</p><p>Here, you can see the <code>call_model &#8594; RunnableSequence &#8594; ChatOpenAI</code> flow. Each step shows its latency, and the trace reveals exactly which prompts and tool inputs the agent generated. This makes it much easier to debug cases where the model loops too long or misuses a tool.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oReK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oReK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!oReK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Pre-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pre-Tool Agent Span" title="Pre-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!oReK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!oReK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Pre-Tool Call Agent Span</em></figcaption></figure></div><p><strong>Tool call spans</strong></p><p>Next, you&#8217;ll see spans for each tool invocation: flights, hotels, weather, and activities. These are especially valuable for diagnosing external API performance.</p><p>For example:</p><ul><li><p><code>get_flight_tickets</code> &#8594; duration ~13ms</p></li><li><p><code>get_hotel_bookings</code> &#8594; duration ~25ms</p></li><li><p><code>get_weather</code> &#8594; duration ~16ms</p></li><li><p><code>get_activities</code> &#8594; duration ~11ms</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WFDK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WFDK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Calls Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Calls Span" title="Tool Calls Span" srcset="https://substackcdn.com/image/fetch/$s_!WFDK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Calls Span</em></figcaption></figure></div><p><strong>Closing Agent span</strong></p><p>After the tool calls, the workflow enters a closing agent span, where the LLM takes all tool outputs (flights, hotels, weather, activities) and composes the final travel itinerary.</p><p>This is where the agent stitches together structured API responses into a user-friendly itinerary. By inspecting this span, you can:</p><ul><li><p>Review the exact prompt the LLM used to summarize tool outputs.</p></li><li><p>Measure how much time the final response generation takes.</p></li><li><p>Verify the final message content before it&#8217;s returned to the user.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pr7q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Post-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Post-Tool Agent Span" title="Post-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Post-Tool Call Agent Span</em></figcaption></figure></div><p>With all this data, you can answer critical performance questions about your trip planner agent:</p><ul><li><p><strong>Where is the time going?</strong> Is most of the latency in the agent&#8217;s reasoning, external API calls, or final response assembly?</p></li><li><p><strong>Which tools are slowest?</strong> For instance, if <code>get_hotel_bookings</code> consistently takes longer, you might need caching or a faster API provider.</p></li><li><p><strong>Is the agent reasoning efficiently?</strong> If the initial or closing agent spans dominate total latency, you may need to optimize prompts or reduce unnecessary loops.</p></li></ul><p>Instead of guessing why an itinerary takes 20+ seconds to generate, SigNoz gives you a connected, end-to-end view of each request turning your LangChain workflow from a black box into a fully observable system.</p><h2><strong>Visualizing Data in SigNoz with Dashboards</strong></h2><p>Once your LangChain trip planner agent is instrumented with OpenTelemetry, SigNoz gives you teh ability to create rich dashboards to explore the emitted telemetry data. Built-in filters and span attributes make it easy to drill down into agent reasoning latency, tool performance, or model usage. This gives you a real-time pulse on how your application is performing end-to-end.</p><p>Here are some of the most insightful panels we built using the traces from our instrumented trip planner workflow:</p><p><strong>p95 Duration for Agent </strong><code>call_model</code></p><p>This panel shows the 95th percentile latency for the LLM calls made by the agent. Since LLM generation is often the longest-running step, tracking p95 duration helps you identify worst-case response times and tune prompts or model choices to improve user experience.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Vnh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 424w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 848w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp" width="1274" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;call_model duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="call_model duration" title="call_model duration" srcset="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 424w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 848w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>call_model Duration Panel</em></figcaption></figure></div><p><strong>Tool Call Distribution</strong></p><p>This panel visualizes how often different tools&#8212;flights, hotels, weather, and activities&#8212;are invoked across all trip planning sessions. It gives you a clear sense of workload distribution: for example, hotel searches may dominate requests while activity lookups are used less frequently. Understanding this helps with capacity planning and prioritizing optimizations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OLiV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OLiV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 424w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 848w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1272w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp" width="996" height="1020" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1020,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Distribution" title="Tool Distribution" srcset="https://substackcdn.com/image/fetch/$s_!OLiV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 424w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 848w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1272w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Distribution Panel</em></figcaption></figure></div><p><strong>Input and Output Token Usage</strong></p><p>This panel tracks the total number of input and output tokens processed by the LLM over time. Input tokens include the user query and tool outputs passed to the model, while output tokens are the generated itineraries or chatbot replies. Monitoring this helps you manage costs, optimize prompt length, and detect patterns in response verbosity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sZdl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sZdl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 424w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 848w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp" width="694" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:694,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Token Usage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Token Usage" title="Token Usage" srcset="https://substackcdn.com/image/fetch/$s_!sZdl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 424w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 848w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>I/O Token Usage Panel</em></figcaption></figure></div><p><strong>p95 Duration of Each Tool Call</strong></p><p>This panel breaks down the latency of each tool: <code>get_flight_tickets</code>, <code>get_hotel_bookings</code>, <code>get_weather</code>, and <code>get_activities</code>. By tracking the 95th percentile duration, you can quickly spot which external API is the slowest under peak load and decide whether caching, retries, or provider changes are needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!15qZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!15qZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 424w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 848w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1272w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" width="1456" height="992" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:992,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Duration" title="Tool Duration" srcset="https://substackcdn.com/image/fetch/$s_!15qZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 424w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 848w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1272w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Durations Panel</em></figcaption></figure></div><p><strong>LLM Model Distribution</strong></p><p>If your app is configured to use multiple LLMs, this panel shows the distribution of model usage. It&#8217;s useful for analyzing trade-offs between speed, quality, and cost. For example, you might run most queries on a smaller, cheaper model but switch to a larger model for complex multi-step itineraries.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!epAn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!epAn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 424w, https://substackcdn.com/image/fetch/$s_!epAn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 848w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1272w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp" width="1190" height="1090" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1090,&quot;width&quot;:1190,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Model Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Model Distribution" title="Model Distribution" srcset="https://substackcdn.com/image/fetch/$s_!epAn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 424w, https://substackcdn.com/image/fetch/$s_!epAn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 848w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1272w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LLM Model Distribution Panel</em></figcaption></figure></div><p>With these dashboards in place, you can move beyond anecdotal debugging and gain data-driven insights into your LangChain agent. Whether it&#8217;s latency hotspots, tool reliability, or token usage trends, SigNoz provides the observability foundation you need to scale AI-driven trip planning with confidence.</p><h2><strong>Wrapping it Up</strong></h2><p>Building LangChain agents like a trip planner is exciting. There&#8217;s something magical about watching an AI plan your flights, hotels, activities, and even answer follow-up questions in natural language. But that magic only lasts if the app stays fast, reliable, and trustworthy. To make that happen, you need a clear view of what&#8217;s going on under the hood.</p><p>By pairing OpenTelemetry&#8217;s vendor-neutral instrumentation with SigNoz&#8217;s powerful observability platform, you can follow every step of your LangChain workflow from agent reasoning to tool calls and final response generation. With this visibility, debugging becomes faster, performance tuning becomes data-driven, and your users get consistently great experiences.</p><p>In AI-powered apps, guesswork is the enemy. Observability is how you replace it with clarity, and that&#8217;s how you build LangChain systems you can trust.</p><h2><strong>Coming Next: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP</strong></h2><p>In the <strong><a href="https://signoz.io/blog/monitoring-langchain-agent-querying-signoz-mcp-server/">next part</a></strong> of this series, we&#8217;ll go deeper into observability by looking at a LangChain agent that integrates with an MCP (Model Context Protocol) server. This opens up richer interactions, but also more moving parts where observability becomes even more critical.</p><p><strong><a href="https://signoz.io/blog/monitoring-langchain-agent-querying-signoz-mcp-server/">Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP]]></title><description><![CDATA[Let's explore how to instrument a LangChain trip planner agent with OpenTelemetry and send telemetry data to SigNoz.]]></description><link>https://newsletter.signoz.io/p/full-circle-observability-using-signoz</link><guid isPermaLink="false">https://newsletter.signoz.io/p/full-circle-observability-using-signoz</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 31 Aug 2025 14:23:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!84Qv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <strong><a href="https://signoz.io/blog/langchain-observability-with-opentelemetry/">Part 1</a></strong> of this series, we explored how to instrument a LangChain trip planner agent with OpenTelemetry and send telemetry data to SigNoz. By tracing each step of the planning process: LLM reasoning, tool calls for flights, hotels, weather, and activities, and the final itinerary response, we saw how observability turns a black-box agent workflow into a transparent, debuggable system.</p><p>That foundation gave us insights into latency hotspots, tool failures, and agent reasoning loops which are all critical for ensuring a reliable user experience in production AI apps.</p><p>In this second part, we&#8217;ll take observability a step further by introducing MCP (Model Context Protocol) servers into the mix. Specifically, we&#8217;ll look at a LangChain agent integrated with a SigNoz MCP server, which allows the agent to directly query logs, metrics, and traces from a connected SigNoz instance.</p><p>This means that instead of just sending observability data to SigNoz, the agent itself can consume and reason over observability data in real time.</p><p>We&#8217;ll walk through how to set up a SigNoz MCP agent with LangChain, instrument it with OpenTelemetry, and explore the kinds of insights it can surface when observability data becomes part of the agent&#8217;s context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKTd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Building the Example App: A LangChain SigNoz MCP Agent</strong></h2><p>In this part, we&#8217;ll demonstrate a LangChain agent that integrates with a SigNoz MCP (Model Context Protocol) server. The goal of this app is to make observability data: logs, metrics, and traces queryable through natural language.</p><p>Users can interact with the agent just like a chatbot, asking operational and performance-related questions such as:</p><ul><li><p><em>&#8220;What are all the active services in the last 5 hours&#8221;</em></p></li><li><p><em>&#8220;Which service has the highest error rate this week?&#8221;</em></p></li><li><p><em>&#8220;Show me the logs generated in the last 1 hour.&#8221;</em></p></li></ul><p>Behind the scenes, the LangChain agent communicates with the SigNoz MCP server, which exposes endpoints for querying observability data. The agent decides which endpoint to call (logs, metrics, or traces), retrieves the relevant data, and then uses the LLM to generate a clear, human-readable summary for the user.</p><p>All of this activity is itself instrumented with OpenTelemetry. Each agent reasoning step, MCP server call, and final response generation is captured as spans and sent to SigNoz.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vv3d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vv3d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP Agent App Image" title="MCP Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!vv3d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Chat</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tht0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tht0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP Agent App Image" title="MCP Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!Tht0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Chat</em></figcaption></figure></div><h2><strong>Try the SigNoz MCP Agent Yourself</strong></h2><p>Want to explore the LangChain SigNoz MCP Agent in action? Clone the <strong><a href="https://github.com/SigNoz/signoz-mcp-demo">repo</a></strong>, install dependencies, and follow the setup steps in the <strong><a href="https://github.com/SigNoz/signoz-mcp-demo/blob/main/README.md">README</a></strong> to connect the agent with your own SigNoz instance.</p><pre><code><code>git clone https://github.com/SigNoz/signoz-mcp-demo.git
</code></code></pre><p>After cloning the repo, you can run the agent locally and start asking natural language questions about your observability data&#8212;logs, metrics, and traces&#8212;from SigNoz.</p><p>The README provides step-by-step guidance for configuring the MCP server connection and running the demo.</p><h2><strong>Making Sense of Your Telemetry Data</strong></h2><p>Once telemetry is enabled for the SigNoz MCP agent, traces clearly show how a user request flows through LangGraph, the agent&#8217;s reasoning, the MCP tool invocation, and the final response assembly. In a typical run, you&#8217;ll see this shape:</p><p><code>query_endpoint</code><strong> (root span)</strong></p><p>This top-level span represents the entire MCP query lifecycle from the user&#8217;s natural-language prompt to the final summarized answer. It&#8217;s your single place to track end-to-end latency for an observability question, containing the LangGraph from our previous blog.</p><p>Use the right-hand attributes to confirm request metadata and inspect the input/output payloads that kicked off the flow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LtLH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LtLH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Root Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Root Span" title="Root Span" srcset="https://substackcdn.com/image/fetch/$s_!LtLH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Root Span</em></figcaption></figure></div><p><strong>Initial Agent span (planning &amp; tool selection)</strong></p><p>Nested under the root is the first agent span. Here the LLM interprets the user&#8217;s question and decides which MCP capability to call (logs, metrics, or traces). In your example, the chain shows:</p><p><code>call_model &#8594; RunnableSequence &#8594; ChatOpenAI &#8594; should_continue</code></p><p>This span&#8217;s duration is a good proxy for prompt complexity and reasoning cost before any external call happens.</p><p>What to look for:</p><ul><li><p>Long initial agent spans can indicate heavy prompts or unnecessary loops.</p></li><li><p>Inputs/outputs show the exact messages the model which is great for debugging misinterpretations.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0yTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Pre-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pre-Tool Agent Span" title="Pre-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Pre-Tool Call Agent Span</em></figcaption></figure></div><p><strong>MCP tool span (data retrieval from SigNoz)</strong></p><p>Next comes the MCP tool call. For example, a <code>fetch_services</code> operation hitting the SigNoz MCP server to retrieve services, metrics, logs, or traces. This is the place to diagnose backend/query latency and payload size issues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ncvj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ncvj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Calls Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Calls Span" title="Tool Calls Span" srcset="https://substackcdn.com/image/fetch/$s_!ncvj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Calls Span</em></figcaption></figure></div><p><strong>Closing Agent span (reasoning &amp; final answer)</strong></p><p>After the tool response, the closing agent span composes the final answer: it parses MCP results, filters/sorts/aggregates as needed, and generates a clean natural-language summary.</p><p>What to look for:</p><ul><li><p>Long closing spans usually mean large MCP payloads being summarized (token pressure) or extra follow-up reasoning.</p></li><li><p>Inspect the prompt the agent used for summarization to ensure it&#8217;s concise and grounded in the retrieved data.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0yTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Post-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Post-Tool Agent Span" title="Post-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Post-Tool Call Agent Span</em></figcaption></figure></div><h3><strong>Handling Errors with Full Context</strong></h3><p>Errors are inevitable in AI agents: API limits, bad tool responses, or timeouts. Without observability, it&#8217;s hard to know <em>what failed</em> and <em>where</em>.</p><p>With SigNoz, errors are tied to specific spans in the trace, so you can see:</p><ul><li><p><strong>Which component failed</strong> (agent reasoning, tool call, or response synthesis).</p></li><li><p><strong>What the error was</strong> (rate limit, timeout, schema mismatch, etc.).</p></li><li><p><strong>When in the request it happened</strong>.</p></li></ul><p>In this example, a RateLimitError from OpenAI is clearly flagged in the closing agent span. The trace shows the error message, stack trace, and context all in one place.</p><p>Instead of guessing, you know exactly what broke, where, and why, making debugging much faster and safer in production.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rOp0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rOp0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 424w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 848w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1272w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp" width="1456" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Error Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Error Span" title="Error Span" srcset="https://substackcdn.com/image/fetch/$s_!rOp0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 424w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 848w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1272w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Token Limit Error Span</em></figcaption></figure></div><p><strong>What you can answer with these traces:</strong></p><ul><li><p><strong>Where is the latency?</strong></p><p>Is time spent in planning (initial agent), in the MCP query (tool span), or in summarization (closing agent)?</p></li><li><p><strong>Are queries efficient?</strong></p><p>Tool spans reveal slow MCP endpoints and overly broad filters. Tighten time windows or add constraints.</p></li><li><p><strong>Is the model working too hard?</strong></p><p>Long agent spans (before or after tools) suggest prompt bloat, unnecessary loops, or passing too much raw data back to the LLM.</p></li><li><p><strong>Is the workflow stable?</strong></p><p>Use span status codes and events to spot intermittent errors (schema mismatches, token limits, provider hiccups).</p></li></ul><p>With this structure, SigNoz turns the MCP-powered workflow from a black box into a fully traceable conversation: <strong>user prompt &#8594; agent planning &#8594; MCP tool call &#8594; agent summary</strong>. That visibility makes debugging faster, optimization data-driven, and your observability assistant consistently reliable.</p><h2><strong>Visualizing Data in SigNoz with Dashboards</strong></h2><p>Once your LangChain SigNoz MCP agent is instrumented with OpenTelemetry, SigNoz gives you the ability to create rich dashboards to explore emitted telemetry data. Built-in filters and span attributes make it easy to drill down into agent reasoning latency, MCP query performance, error patterns, and model usage. This provides a real-time pulse on how your observability agent itself is performing end-to-end.</p><p>Here are some insightful panels we built using the traces from our instrumented MCP workflow:</p><p><strong>p95 Duration for Agent </strong><code>call_model</code></p><p>This panel shows the 95th percentile latency for LLM calls made by the agent. Since generation often dominates total response time, monitoring p95 latency highlights worst-case scenarios and helps you optimize prompts, reduce context size, or adjust model selection.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mIbC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mIbC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 424w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 848w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1272w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp" width="1198" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;call_model duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="call_model duration" title="call_model duration" srcset="https://substackcdn.com/image/fetch/$s_!mIbC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 424w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 848w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1272w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>call_model Duration Panel</em></figcaption></figure></div><p><strong>MCP Tool Call Distribution</strong></p><p>This panel visualizes how often the agent queries different MCP endpoints: logs, metrics, or traces. It gives you a sense of workload distribution, showing whether users are primarily asking about latency, error logs, or trace investigations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!84Qv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!84Qv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 424w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 848w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1272w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" width="1000" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Distribution" title="Tool Distribution" srcset="https://substackcdn.com/image/fetch/$s_!84Qv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 424w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 848w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1272w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Distribution Panel</em></figcaption></figure></div><p><strong>Input and Output Token Usage</strong></p><p>This panel tracks the total number of input and output tokens processed by the LLM over time. Input tokens include user queries and MCP responses passed into the model, while output tokens are the agent&#8217;s natural language answers. Monitoring this helps manage cost and detect patterns in verbosity or context expansion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wv2t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 424w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 848w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1272w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp" width="704" height="1102" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/def37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1102,&quot;width&quot;:704,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Token Usage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Token Usage" title="Token Usage" srcset="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 424w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 848w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1272w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>I/O Total Token Usage</em></figcaption></figure></div><p><strong>Model Call Error Rate Over Time</strong></p><p>This panel tracks the error rate of model calls, visualized as a line chart. Spikes here may indicate upstream issues such as invalid MCP responses, token limits being exceeded, or transient API errors. By correlating these errors with traffic patterns, you can quickly pinpoint reliability issues in production.</p><p>With these dashboards in place, you can move beyond ad-hoc debugging and gain data-driven insights into your MCP agent. Whether it&#8217;s latency hotspots, slow SigNoz queries, token usage spikes, or rising error rates, SigNoz provides the observability foundation you need to ensure your AI-driven observability assistant stays reliable and responsive.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1FAA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1FAA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 424w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 848w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1272w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp" width="1174" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1174,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Error Rate&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Error Rate" title="Error Rate" srcset="https://substackcdn.com/image/fetch/$s_!1FAA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 424w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 848w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1272w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Total Error Rate Panel</em></figcaption></figure></div><h2><strong>Wrapping it Up</strong></h2><p>LangChain agents integrated with MCP servers open the door to powerful new workflows, but that power comes with more moving parts: LLM calls, tool interactions, server communications, and error handling. Without the right observability, it&#8217;s easy for problems to hide in the noise.</p><p>By pairing OpenTelemetry with SigNoz, you get full visibility into the agent lifecycle: where time is spent, which tools are bottlenecks, and what errors are occurring. Whether it&#8217;s a slow external API, a looping agent, or a rate limit error, you can see exactly what happened and where.</p><p>With this clarity, debugging becomes faster, scaling becomes smoother, and users get more reliable experiences even as your agents grow more complex.</p>]]></content:encoded></item><item><title><![CDATA[Why Observability Isn’t Just for SREs (and How Devs Can Get Started)]]></title><description><![CDATA[This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before.]]></description><link>https://newsletter.signoz.io/p/why-observability-isnt-just-for-sres</link><guid isPermaLink="false">https://newsletter.signoz.io/p/why-observability-isnt-just-for-sres</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 17 Aug 2025 14:15:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qbYV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Almost every other day, when I scroll past r/devops or r/sre, I see a <strong><a href="https://signoz.io/blog/why-observability-isnt-just-for-sres/www.reddit.com/r/sre/comments/1b54tpp/software_engineer_sre_devops/">post like this</a></strong> asking how a dev can get started with devops, observability, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H4Mg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 424w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 848w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1272w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sample Reddit thread on how to get started with OTel&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sample Reddit thread on how to get started with OTel" title="Sample Reddit thread on how to get started with OTel" srcset="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 424w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 848w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1272w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Sample Reddit thread on how to get started with OTel. Source: <strong><a href="https://signoz.io/blog/why-observability-isnt-just-for-sres/www.reddit.com/r/sre/comments/1b54tpp/software_engineer_sre_devops/">Reddit</a></strong></em></figcaption></figure></div><p>Sample Reddit thread on how to get started with OTel</p><p>This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before.</p><p>A dev&#8217;s observability playbook.</p><h2><strong>Why should you, a developer, care?</strong></h2><p>As devs, we often obsess over making our code neater, maintaining systems better, and reducing technical debt. We think of a couple of edge cases and handle them well. We write some tests, debug a bit, drink some hot brew, then call it a day. However, in 2025, I am unsure if this will make the cut.</p><p>Here&#8217;s a short elevator pitch on why you, <em>as a developer,</em> should care about observability today.</p><h3><strong>Product Engineers With Extreme Ownership</strong></h3><p>Gone are the days when a PM would hand you a requirements document and design, and <em>then</em> <strong>you would just code</strong> and <em>then</em> leave the testing to a QA and <em>then</em> whatever happens next to the SREs. The role of a dev has expanded <em>beyond</em> this.</p><p>Here&#8217;s what a day in the life of a product/software engineer looks like today,</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O7_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O7_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 424w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 848w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1272w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp" width="844" height="1194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1194,&quot;width&quot;:844,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Day in the life of an engineer at SigNoz&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Day in the life of an engineer at SigNoz" title="Day in the life of an engineer at SigNoz" srcset="https://substackcdn.com/image/fetch/$s_!O7_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 424w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 848w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1272w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Day in the life of an engineer at SigNoz. Source: <strong><a href="https://signoz.io/blog/srikanth-signoz/">SigNoz Blog</a></strong></em></figcaption></figure></div><p>You are <em>kind of</em> expected to know everything, at least a little bit. Companies increasingly value <em>product engineers</em> who own the <em>full lifecycle of a feature</em> from design to coding to deployment and monitoring. It means <em>you</em>, the developer, need to know when your application misbehaves in the wild and be ready to fix it, which is exactly what observability enables.</p><h3><strong>Systems are Scaling Faster (and getting more complex)</strong></h3><p>Modern software architectures have exploded in scale and complexity. We&#8217;re building distributed microservices, deploying to clouds and Kubernetes, handling global user traffic and <em>shipping faster</em> than ever before. When hundreds of containers or functions communicate with each other, failures often cascade in unpredictable ways. We thrive on getting a holistic view of these complex systems, which is exactly what observability solves for.</p><h3><strong>Testing every Edge Case isn&#8217;t Feasible</strong></h3><p>Building something as simple as an input box itself can include a multitude of edge cases.</p><ul><li><p>What if the input is too short or too long?</p></li><li><p>What if there&#8217;s a special character in the input?</p></li><li><p>How to handle white spaces?</p></li><li><p>How to handle SQL injection?</p></li></ul><p>These are a few from the top of my mind. However, testing and brainstorming for potential edge cases can become increasingly cumbersome as systems become more complex.</p><p>Observability acts as your safety net, catching issues that slip through testing and helping you understand real-world system behaviour.</p><h3><strong>Users don&#8217;t like bugs, but they HATE slow resolution more</strong></h3><p>As an end-user for a lot of products, I am very impatient when something stops working. So I can <em>imagine</em> what users feel like when their product doesn&#8217;t work as expected. Bugs and outages are never welcome, but what really frustrates users is when issues drag on without a fix.</p><p>We are in a <em>highly fast-paced</em> world, where no one waits for anything, and users have zero tolerance for downtime or latency. <strong>Performance of systems is mission-critical</strong>. Every hour of downtime or a delayed fix can cost a substantial amount.</p><p>Observability is what makes rapid resolution possible; it helps you spot issues immediately and pinpoint the root cause without wasting time, and directly translates to customer retention.</p><h2><strong>Observability: Beyond APM and Infra Monitoring</strong></h2><p>So what&#8217;s the point of <em>observability</em>, anyway?</p><p>Is it just a fancy word for monitoring?</p><p>Not really.</p><p>Traditional Application Performance Monitoring [APM] and infrastructure monitoring are about tracking known metrics [CPU, memory, request latency, etc.] and alerting on predefined thresholds. Observability goes <strong>beyond</strong> that by enabling you to infer the internal state of the system from its outputs.</p><p>It&#8217;s often defined by three pillars of telemetry data &#8212; <strong>logs, metrics, and traces, but there&#8217;s more to it</strong>. Together, they give you a 360&#176; view of what&#8217;s happening inside your applications.</p><ul><li><p><strong>Logs</strong> are the record of events [think of them as your app&#8217;s diary of what it&#8217;s doing].</p></li><li><p><strong>Metrics</strong> are numeric measurements [e.g. memory at 75%, 500 requests/minute] that track trends and health.</p></li><li><p><strong>Traces</strong> follow the path of a single request or transaction through multiple services [useful in microservices to see how a request <em>flows</em> and where it slows down].</p></li></ul><p>Observability tools unify these signals to help you answer <em>new questions</em> about your system&#8217;s behaviour, not just the ones you preset.</p><p>For example, a classic monitor might tell you the <em>error rate exceeded 5%</em> and <em>something&#8217;s wrong,</em> whereas an observability approach lets you dig in and ask <em>why</em> it&#8217;s wrong, which users or inputs caused this? What else was happening on the system at that time?</p><p>It&#8217;s a more exploratory, investigative mindset.</p><p>Crucially, observability isn&#8217;t limited to just application performance like APM is. Today it has expanded to cover the <strong>health of the entire system</strong>, including <strong>infrastructure and third-party services</strong>. APM might catch known issues [say, a slow database query you anticipated], but observability will help surface the <em>weird, unexpected issues</em> that weren&#8217;t explicitly looked for.</p><h2><strong>Hello World, OpenTelemetry.</strong></h2><p>By now, hopefully, it&#8217;s clear why you should care about observing your systems actively. The next <em>obvious</em> question is how you can achieve it as a developer.</p><p>Your observability tooling will often be influenced by what your org has already adopted. Many teams inherit an existing monitoring stack, maybe Prometheus for metrics [along with dashboards powered by PromQL], or a log system that uses LogQL. These tools may already be wired into alerting pipelines, dashboards, and operational runbooks. In such cases, it&#8217;s wise to continue using what&#8217;s already working well.</p><p>The good news is that OpenTelemetry <em>plays nicely</em> with many of these tools, so you can gradually adopt it without disrupting what&#8217;s in place.</p><p>That said, if you <strong>are</strong> starting on a fresh slate, I&#8217;d strongly recommend OpenTelemetry [OTel]. The advantages that OTel brings to the table are plenty. You can read more about the advantages of having a vendor-agnostic and open-source observability framework from OTel&#8217;s <strong><a href="https://opentelemetry.io/docs/what-is-opentelemetry/">official docs</a></strong>.</p><h3><strong>OpenTelemetry in &lt; 200 words</strong></h3><p>At its core, OTel introduces the idea of <em>signals,</em> primarily traces, metrics, and logs, that describe what your application is doing. Developers use the <strong>OTel API</strong> to create and emit these signals, while the <strong>OTel SDK</strong> handles the heavy lifting of batching, processing, and exporting the data to your chosen backend.</p><p>Instrumentation [the process of collecting these signals] can be done automatically or manually. Usually, it&#8217;s a humble <em>mix of both</em>.</p><p>Once instrumented, your application will emit trace spans, metrics, and logs in OTel&#8217;s standard formats. You can configure exporters to send that data to various backends, whether that&#8217;s printing to console during development or an observability vendor. The key point is that OTel decouples instrumentation from the backend. You instrument your code once, then choose where to send the data. This means you get the flexibility to start small with any vendor and switch up as you scale to vendors that are better suited for your needs.</p><p>To understand OTel in more depth, I highly suggest you to give <strong><a href="https://signoz.io/blog/what-is-opentelemetry/">this</a></strong> a read.</p><h3><strong>Copy, Paste &amp; Run Example</strong></h3><p>Let me take you through a small exercise that can quickly show you the power of OpenTelemetry.</p><p>Say you have any application, it could be a side project or a micro-service you are an owner of [just start a new branch &#129335;&#127995;&#8205;&#9792;&#65039;]. Since Python is a highly common language, the next couple of instructions will be for a Python application, but a simple Google search or LLM input would help to tweak this to any language of your choice.</p><ol><li><p>Create and activate a virtual environment</p></li><li><p>Run the following commands, in the given sequence,</p></li></ol><pre><code><code>pip install opentelemetry-distro

pip install flask requests

opentelemetry-bootstrap -a install

opentelemetry-instrument --traces_exporter console --metrics_exporter console --logs_exporter console python server_automatic.py

</code></code></pre><p>You just completed a very basic instrumentation of your Python application, and you should be seeing traces, metrics and logs getting output to your console!</p><p><strong>&#9888;&#65039; Reminder</strong></p><p>Reminder: This is a very basic example of instrumentation. OpenTelemetry has way more potential and power, but this is a starter example to quickly give you a hands-own experience on how instrumentation with OpenTelemetry feels like, and what kind of telemetry data can you collect.</p><h2><strong>Beating the OpenTelemetry Learning Curve</strong></h2><p>Like almost any other skill in life, OpenTelemetry also has quite a learning curve. In fact, there is a whole <strong><a href="https://www.reddit.com/r/devops/comments/nxrbqa/opentelemetry_is_great_but_why_is_it_so_bloody/">Reddit thread</a></strong> titled &#8220;<strong>OpenTelemetry is great, but why is it so bloody complicated?&#8221;.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pk0c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 424w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 848w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp" width="1456" height="1369" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1369,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Reddit thread on why OTel can be complicated&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Reddit thread on why OTel can be complicated" title="Reddit thread on why OTel can be complicated" srcset="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 424w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 848w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Reddit thread on why OTel can be complicated. Source: <strong><a href="https://www.reddit.com/r/devops/comments/nxrbqa/opentelemetry_is_great_but_why_is_it_so_bloody/">Reddit</a></strong></em></figcaption></figure></div><p>Depending on how deeply you want to observe your application, the complexity can vary. For instance, getting started with an example shown above is very easy, but the moment you dive deeper into traces, logs, metrics, spans, etc., it can become overwhelming. I&#8217;d like to introduce you to the Dunning-Kruger curve of confidence vs. competence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qbYV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qbYV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 424w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 848w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" width="1131" height="679" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:679,&quot;width&quot;:1131,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dunning-Kruger curve of confidence vs. competence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dunning-Kruger curve of confidence vs. competence" title="Dunning-Kruger curve of confidence vs. competence" srcset="https://substackcdn.com/image/fetch/$s_!qbYV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 424w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 848w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Dunning-Kruger curve of confidence vs. competence. Source: <strong><a href="https://www.linkedin.com/pulse/have-you-experienced-dunning-kruger-effect-when-hiring-jason-culloo/">Jason Culloo on LinkedIn</a></strong></em></figcaption></figure></div><p>I just wanted to tell you not to be discouraged. This steep ramp-up is common, and with a step-by-step approach, you will get comfortable with the concepts over time.</p><p>I highly suggest following our <strong><a href="https://signoz.io/resource-center/opentelemetry/">Blog series on OpenTelemetry</a></strong>, which has articles on almost every topic related to OpenTelemetry.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to stay updated.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2><strong>Next steps..</strong></h2><p>So where do you go from here?</p><h3><strong>Review your side projects</strong></h3><p>First, review your current projects or side projects and assess your observability gaps. Are you <em>logging</em> enough information? Do you have <em>metrics</em> for key behaviours? If not, that&#8217;s a great place to begin. Start instrumenting gradually using the tools and tips above. Consider implementing OpenTelemetry in one service and showcasing a trace to your team; it might inspire wider adoption once they see the value.</p><h3><strong>Make Observability a Habit</strong></h3><p>Next, consider making <strong>observability a habit in your development workflow</strong>. For instance, when reviewing code or designing new components, include observability questions in the process [e.g., &#8220;How will we know if this fails in production?&#8221;]. Over time, you and your team will naturally build more observable systems. This <em>proactive</em> approach eventually pays off by reducing nasty surprises and shortening debug sessions when issues do occur. Keep learning and stay updated. The observability landscape is evolving [with improvements in OpenTelemetry, new analysis tools, etc.], and being knowledgeable will set you apart. Follow a couple of observability blogs or community forums [the r/observability subreddit, devops blogs, CNCF talks] to see what challenges others are solving.</p><h3><strong>Embrace the Ownership Mindset</strong></h3><p>Finally, <strong>embrace the mindset</strong>*: as a developer, caring about observability means caring about your software <em>beyond</em> just writing code. It&#8217;s about owning the reliability and performance of what you build. In 2025 and beyond, the ability to quickly understand and fix issues in complex systems is <em>gold</em>. By investing in observability and tools like OpenTelemetry, you&#8217;re essentially future-proofing your career and your projects. So grab that playbook, get your hands dirty with some telemetry, and start turning those <em>unknown unknowns</em> into <em>well-understood knowns</em>. Your users [and your on-call self] will thank you!</p>]]></content:encoded></item></channel></rss>