<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Observability Real Talk]]></title><description><![CDATA[Stories from hard-core observability nerds at SigNoz - spreading the word on open-source, observability, OpenTelemetry and behind-the-scenes of building a dev tool infra product.

]]></description><link>https://newsletter.signoz.io</link><image><url>https://newsletter.signoz.io/img/substack.png</url><title>Observability Real Talk</title><link>https://newsletter.signoz.io</link></image><generator>Substack</generator><lastBuildDate>Sun, 12 Apr 2026 19:25:50 GMT</lastBuildDate><atom:link href="https://newsletter.signoz.io/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[SigNoz]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[signoz@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[signoz@substack.com]]></itunes:email><itunes:name><![CDATA[SigNoz]]></itunes:name></itunes:owner><itunes:author><![CDATA[SigNoz]]></itunes:author><googleplay:owner><![CDATA[signoz@substack.com]]></googleplay:owner><googleplay:email><![CDATA[signoz@substack.com]]></googleplay:email><googleplay:author><![CDATA[SigNoz]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[How the Sharks Do Observability ]]></title><description><![CDATA[An account on how Netflix and Uber observe their massive systems everyday.]]></description><link>https://newsletter.signoz.io/p/how-the-sharks-do-observability</link><guid isPermaLink="false">https://newsletter.signoz.io/p/how-the-sharks-do-observability</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Thu, 02 Apr 2026 13:45:52 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!POQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This blog took 6 days and 7 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p>As an observability enthusiast working at an observability startup and running an observability newsletter, I find this topic wildly fascinating. I know a bunch of lore on how companies thought about and invented (or, more precisely, reinvented) their observability systems to support their growing scale. But two of these stories have stuck with me and are interesting because each broke its observability system at a critical moment of growth and rebuilt it in a completely different and particularly breathtaking way, from which we have a lot to learn!</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!POQA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!POQA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!POQA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png" width="1456" height="896" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/804e9de6-b693-4987-b806-255056ffd377_2600x1600.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:896,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:3239153,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/192927508?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!POQA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 424w, https://substackcdn.com/image/fetch/$s_!POQA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 848w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1272w, https://substackcdn.com/image/fetch/$s_!POQA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F804e9de6-b693-4987-b806-255056ffd377_2600x1600.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>1. Netflix</h2><p>Netflix&#8217;s observability origin story starts in a place that will make most engineers wince. In May 2011, Netflix was using a home-grown solution called Epic to manage time-series data. Epic was a combination of Perl CGI scripts, RRDTool logging, and MySQL. Alongside Epic, their telemetry was split between this home-grown tool and an IT-provisioned commercial product. Epic&#8217;s flexibility letting engineers send in arbitrary time-series data and query it made it popular, and it became the primary system of record.</p><p>They were tracking around 2 million distinct time series, and the monitoring system was regularly failing to keep up with the volume of data, and several things were about to make it dramatically worse: Netflix was shifting from rolling pushes to red/black deployments, starting to actually leverage auto-scaling rather than just using fixed-size groups, and expanding internationally into Latin America and Europe.</p><p>All these changes required them to scale by at least an order of magnitude from 2 million to 20 million metrics or more. Perl CGI scripts and MySQL were never going to handle what Netflix was becoming, and it was simply beyond what Epic was capable of.</p><p>So in early 2012, they started building&nbsp;<strong><a href="https://netflix.github.io/atlas-docs/">Atlas</a></strong><a href="https://netflix.github.io/atlas-docs/">,</a>&nbsp;and by late 2012, it was being phased into production, with full deployment completed in early 2013.</p><p>The design philosophy behind Atlas is a chapter filled with learnings. Atlas features in-memory data storage, allowing it to gather and report very large numbers of metrics very quickly. It captures operational intelligence whereas business intelligence analyses trends over time, operational intelligence provides a picture of what is currently happening within a system.</p><p>Since their focus was primarily on operational insight, the top priority was determining what&#8217;s going on right now. This led to the following rules of thumb:</p><p>1/ data becomes exponentially less important as it gets older</p><p>2/ restoring service is more important than preventing data loss</p><p>This is a fundamentally different philosophy from <em>store everything forever</em>. Netflix decided that recent data matters enormously and old data barely matters at all.</p><p>The internal Atlas deployment breaks data into multiple time windows. The last 6 hours of data is kept fully in memory, so they can show recent data as long as clients can successfully publish. Everything is sharded across machines in these in-memory clusters. For older data, they compute rollups via Hadoop processing, drastically reducing data volume for historical queries.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you want more such observability lore, stay tuned!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p>One of the things they really wanted to fix from Epic was how dimensions worked. In the old system, everything was mangled into a metric name with different conventions per team, and users had to resort to complex regular expressions to slice and dice data. In Atlas, a metric&#8217;s identity is an arbitrary, unique set of key-value pairs. Some keys are set automatically by the client library (server name, AWS zone, ASG, cluster, application, region), with significant flexibility for users to specify whatever keys make sense for their use case.</p><p>The growth numbers tell the story of why all this mattered. In 2011, they were monitoring 2 million metrics. By 2014, they were at 1.2 billion metrics, and the numbers continued to rise. They routinely see Atlas fetch and graph many billions of datapoints per second. Today, Atlas processes 17 billion metrics and 700 billion distributed traces per day on 1.5 petabytes of log data, and the system&#8217;s architecture has kept observability data processing to less than 5% of Netflix&#8217;s infrastructure costs!</p><p>But even Atlas hit its limits. A few years ago, Netflix&#8217;s SRE team was paged because their alerting system was falling behind, and the critical application health alerts were reaching engineers 45 minutes late. One platform team had programmatically created tens of thousands of new alerts, which overwhelmed Atlas&#8217;s query capacity. They were looking at an order-of-magnitude increase in alert queries over the next 6 months, and scaling up Atlas&#8217;s storage layer to serve that volume would have been prohibitively expensive, since Atlas was already one of Netflix&#8217;s largest services in both size and cost.</p><p>Their answer was Atlas Streaming Eval, moving alerting from a cron-based query model to a streaming model. Today, they run 20x as many alert queries as a few years ago, at a fraction of the cost. Multiple platform teams at Netflix programmatically generate and maintain alerts on behalf of their users without affecting others, and streaming evaluation enabled them to relax cardinality restrictions and to alert on queries that were previously rejected.</p><p>What&#8217;s special here is that instead of throwing more hardware at the problem, they changed the model entirely, and in my opinion, that&#8217;s what separates great observability teams from the rest.</p><p>Some interesting references!</p><ul><li><p><a href="https://netflixtechblog.com/introducing-atlas-netflixs-primary-telemetry-platform-bd31f4d8ed9a">Introducing Atlas: Netflix&#8217;s Primary Telemetry Platform</a> (Netflix Tech Blog)</p></li><li><p><a href="https://netflixtechblog.com/improved-alerting-with-atlas-streaming-eval-e691c60dc61e">Improved Alerting with Atlas Streaming Eval</a> (Netflix Tech Blog)</p></li><li><p><a href="https://netflixtechblog.com/lessons-from-building-observability-tools-at-netflix-7cfafed6ab17">Lessons from Building Observability Tools at Netflix</a> (Netflix Tech Blog)</p></li><li><p><a href="https://www.infoq.com/presentations/netflix-edgar-observability/">Solving Mysteries Faster with Observability</a> (InfoQ / QCon)</p></li><li><p><a href="https://netflix.github.io/atlas-docs/">Atlas Documentation</a> (Netflix OSS)</p></li></ul><p></p><h2>2. Uber</h2><p>Uber&#8217;s observability story starts in 2014 with a Graphite, Carbon, and WhisperDB stack that was held together very loosely. By late 2014, all services, infrastructure, and servers at Uber emitted metrics to a Graphite stack, which stored them in the Whisper file format in a sharded Carbon cluster. They used Grafana for dashboarding and Nagios for alerting, issuing Graphite threshold checks via source-controlled scripts.</p><p>The problems were fundamental, like the stack not being horizontally scalable, meaning you couldn&#8217;t add capacity just by adding machines. There were no replicas, so a single node dying meant losing an eighth of all Uber&#8217;s data and adding capacity required taking the system offline for a week or more. <strong><a href="https://www.linkedin.com/in/martinmao/">Martin Mao</a></strong>&#8217;s first on-call week was spent deleting data from the backend just to keep the observability stack alive.</p><p>First, they did a quick fix by swapping in Cassandra for time-series storage and ElasticSearch for the metrics index, all stitched together with Go. They stood up this new system in time for Halloween 2015, which was Uber&#8217;s second-largest peak load event. That year was the first time Uber&#8217;s observability system didn&#8217;t have an outage during the Halloween peak.</p><p>But Cassandra was the wrong tool for the job because they were using it as a time-series database even though it was built as a key-value store. As they entered their hyper-growth phase, the firefighting that had plagued the Graphite years resurfaced in a new form.</p><p>The team decided to build <strong><a href="https://www.uber.com/en-IN/blog/m3/">M3DB</a></strong>, a custom time-series database with an embedded inverted index from scratch. The architecture they landed on is worth understanding in detail.</p><p>Applications on hosts emit metrics to a local daemon called &#8220;<em>Collector</em>&#8220;, which aggregates them at 1-second intervals and then forwards them to the aggregation tier using a shard-aware topology retrieved from etcd. The aggregation tier further aggregates into 10-second and one-minute tiles, and the M3DB ingestor writes them to the storage tier. M3 Coordinator acts as a Prometheus sidecar, providing a global query and storage interface on top of M3DB clusters. It handles downsampling and ad hoc retention using rollup rules stored in etcd, which runs embedded in the binary of an M3DB seed node.</p><p>Let&#8217;s look at the results (quite phenomenal). Any given second, M3 processes 500 million metrics and persists another 20 million aggregated metrics. Extrapolating to a 24-hour cycle means roughly 45 trillion metrics per day, and the platform also houses over 6.6 billion time series!</p><p>The really interesting engineering is in the high-dimensional problem. High-dimensionality metrics; data tracked over time with many different aspects like route, region, and status code are critical to the business but costly at Uber&#8217;s scale. A single emission could lead to 100 million unique time series, and because code changes roll out to specific groups of cities over a few hours, they need city-level monitoring granularity. Different cities have different configurations; for example, rider pickups might be blocked on a street due to a parade, or local events can cause traffic changes.</p><p>Their alerting ecosystem is equally bespoke; it includes two in-datacenter alerting systems: uMonitor for time-series metrics-based alerting against M3, and Neris for host-level checks. Both feed into a common notification and deduplication pipeline called Origami. uMonitor uses static thresholds for steady-state metrics and anomaly thresholds via Argos, Uber&#8217;s anomaly detection platform, which generates dynamic thresholds from historical data.</p><p>They also added <strong><a href="https://www.jaegertracing.io/">Jaeger</a></strong>, their open-source distributed tracing system. Jaeger&#8217;s distributed tracing follows requests from one service to another, composing a narrative of what happened and what went wrong, making it much easier to pinpoint causation.</p><p>The operational improvement after M3 was dramatic. Setting up monitoring in new data centres became 4x faster, and the operational maintenance burden dropped by over 16x, while combined high/low-urgency notifications per week went from 25 with Cassandra to 1.5 with M3DB. &#128079;&#127995;</p><p>Over a million unique visitors hit their systems every day, and more than half of their engineering team are using these observability tools daily.</p><p></p><p>Some resources that were my references and really good reads!</p><ul><li><p><a href="https://www.uber.com/blog/m3/">M3: Uber&#8217;s Open Source, Large-Scale Metrics Platform for Prometheus</a> (Uber Engineering Blog)</p></li><li><p><a href="https://newsletter.pragmaticengineer.com/p/how-uber-built-its-observability-platform">How Uber Built its Observability Platform</a> (The Pragmatic Engineer)</p></li><li><p><a href="https://www.uber.com/blog/observability-at-scale/">Observability at Scale: Building Uber&#8217;s Alerting Ecosystem</a> (Uber Engineering Blog)</p></li><li><p><a href="https://www.uber.com/en-KW/blog/optimizing-m3/">Optimizing M3: How Uber Halved Metrics Ingestion Latency by Forking the Go Compiler</a> (Uber Engineering Blog)</p></li><li><p><a href="https://www.uber.com/blog/optimizing-observability/">Optimizing Observability with Jaeger, M3, and XYS</a> (Uber Engineering Blog)</p><p></p><p></p></li></ul><p>But here&#8217;s an interesting dilemma. What happens when the product you&#8217;re monitoring <em>is</em> the monitoring tool itself? When the observability system that&#8217;s supposed to tell you everything is broken... is the same system you need to diagnose the problem?</p><p>At Signoz, we have solved this exact problem by building a system called <strong><a href="https://gameofthrones.fandom.com/wiki/Night%27s_Watch">Nightswatch</a></strong>, a Game of Thrones-themed architecture featuring builders, rangers, and stewards to run SigNoz to monitor SigNoz.</p><p>That story drops in the next edition. Stay tuned.</p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote>]]></content:encoded></item><item><title><![CDATA[AI Isn't Replacing SREs. It's Deskilling Them.]]></title><description><![CDATA[When AI handles 95% of your incident response, do you get worse at handling the 5% that actually matters?]]></description><link>https://newsletter.signoz.io/p/ai-isnt-replacing-sres-its-deskilling</link><guid isPermaLink="false">https://newsletter.signoz.io/p/ai-isnt-replacing-sres-its-deskilling</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 28 Feb 2026 13:45:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!Hgvq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! <br>&amp;<br>This piece took 6 days, 5 hours to be cooked, hope we served. </em>&#127770;</p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p></blockquote><p></p><p></p><p>There are two popular prophecies floating around tech circles these days.</p><p>The first says <strong><a href="https://swizec.com/blog/the-future-of-software-engineering-is-sre/">SRE is the future of all software engineering</a></strong>, that as AI writes more and more code, the humans who remain will be the ones keeping systems alive. The second says AI will devour every tech job alive, SREs included. Neither is particularly useful if you&#8217;re an SRE trying to figure out what your Tuesday will look like in 2027.</p><p>Let&#8217;s ask a more grounded question by looking at what&#8217;s already happening: When AI handles 95% of your incident response, do you get worse at handling the 5% that actually matters?</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Hgvq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 424w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 848w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1272w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp" width="708" height="473.13461538461536" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:708,&quot;bytes&quot;:947996,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/189391546?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Hgvq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 424w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 848w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1272w, https://substackcdn.com/image/fetch/$s_!Hgvq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F530d847f-26cb-4ddd-a6de-875cbbaf8fab_6233x4167.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>Most of us already use AI for our daily work (our brains a little fried!), and so do SREs. Today&#8217;s discussion agenda is not whether AI replaces SREs, but whether AI is quietly making SREs less capable and whether anyone will notice anything before the next novel outage hits. The foundational framework for this entire debate comes from a 1983 research paper that&#8217;s eerily prescient.</p><p></p><h3>The Ironies of Automation: Part-I</h3><div><hr></div><p>Note from author: Below is a brief precursor to History of Automation, which you might enjoy if you are into History and Cultures (like me &#128521;).</p><div><hr></div><p>In 1983, a cognitive psychologist named Lisanne Bainbridge published a paper called <em><strong><a href="https://www.semanticscholar.org/paper/Ironies-of-automation-Bainbridge/0713bb9d9b138e4e0a15406006de9b0cddf68e28">Ironies of Automation</a></strong></em>. It became one of the most cited papers in human factors research, and its core argument is almost uncomfortably relevant today.</p><p>Bainbridge studied what happened when factories and industrial systems automated the work of blue-collar operators. The findings were paradoxical and revealed that the more you automate a process, the more critical the human operator becomes during the rare moments automation fails and the less practice they get, and the worse they become at exactly those interventions. Automation, which was inherently designed to remove humans from the loop, left them with the worst possible job, i.e., long stretches of passive monitoring punctuated by rare, high-stakes crises they were increasingly unprepared for.</p><p>Ring any bells yet? &#128578;</p><p>Basically, I&#8217;m drawing a parallel between the AI revolution and industrial automation. Industrial automation reshaped blue-collar work by taking over routine physical tasks, and the workers who remained had to handle exceptions they&#8217;d lost the muscle memory for. AI is doing the same thing to knowledge workers by taking over the routine cognitive tasks, the pattern matching, the triage, the known-issue resolution and leaving humans with the rare, complex, ambiguous problems.</p><p>The <em>exact</em> problems that require deep <em>expertise</em>, the <em>exact expertise</em> that atrophies when you stop practising.</p><p>Now we&#8217;re replaying this pattern with AI agents, and the stakes in software systems are only growing.</p><p></p><h2>Current State of AI in SRE</h2><p>Let&#8217;s take stock of where things stand today in the world of site reliability engineering.</p><h3><strong>What&#8217;s already automated or heavily AI-assisted?</strong></h3><p>Alert noise reduction and intelligent grouping, runbook execution for known issues, log pattern detection and anomaly flagging, and basic root-cause suggestions from historical incident data, and auto-remediation for well-understood failure modes like restarting a crashed pod or scaling up a service that&#8217;s running hot, are all fairly automated today.</p><p></p><h3><strong>What&#8217;s on the horizon?</strong></h3><p>Some immediate targets include multi-signal correlation across metrics, logs, and traces, autonomous root-cause analysis for partially understood failures, predictive incident detection before users are affected, AI-driven change risk assessment and automated rollbacks.</p><p>PagerDuty frames this as a tiered model.</p><ul><li><p>Tier 1 incidents: Known issues with known fixes get fully automated.</p></li><li><p>Tier 2 incidents: Partially understood problems receive AI recommendations with human validation.</p></li><li><p>Tier 3 incidents: Novel, complex, cascading failures stay human-led, with AI providing supporting context.</p></li></ul><p>But here&#8217;s the catch.</p><p>If <em>human</em> SREs (okay, now we have to use adjectives like human &#129401;) only engage with Tier 3 incidents, i.e. the novel, never-before-seen outages, where do they build the <em>intuition</em> to handle them? Intuition is usually developed from years of hands-on incident response, pattern recognition built through repetition, and the kind of gut-level understanding of a system that only develops from painfully waking up at the odd hour to solve the bug that brought the system down.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe if you wish to read some more hot takes. We are cooking some great ones!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>The Crisis of Deskilling</h2><p>This is where the picture starts getting blurry. Let&#8217;s look at some emerging research on AI-induced deskilling across multiple fields, which paints a consistent and concerning picture.</p><p>In medicine, a <a href="https://www.thelancet.com/journals/langas/article/PIIS2468-1253(25)00133-5/abstract">recent study</a> found that endoscopists who used AI assistance for polyp detection saw their unassisted detection rates drop from 28% to 22% after a period of AI use. They got worse at the thing they were supposed to be experts in, not because they forgot the theory, but because they stopped exercising the skill.</p><p>In aviation, <a href="https://flightsafety.org/wp-content/uploads/2019/12/IASS_2019_Behrend_Lafargue.pdf">research</a> has shown that long-haul pilots who rely heavily on autopilot systems experience measurable degradation in situational awareness and manual flying ability. The problem got serious enough that the FAA now mandates more manual flying time to counteract the effect.</p><p>Somewhere over the past year, AI stopped being a tool I occasionally reached for and became the first thing I reach for, <em>always</em>. My instinct now is to offload as much as possible and apply my own thinking only where it&#8217;s absolutely unavoidable. The problem is that these moments are becoming the only exercises my brain gets, and I can feel the <em>rust</em>.</p><p>We can draw a pattern here. The more you let the system handle, the worse you get at handling things yourself and here&#8217;s the truly dangerous part, <em>you don&#8217;t feel it happening</em>. It gets masked as hyper-productivity. Cognitive research suggests that because AI tools make tasks feel easier and enhance visible performance, users are often unable to accurately judge the true status of their own skills. You feel competent, dashboards look green, and then on a Wednesday, a novel incident hits that doesn&#8217;t match any pattern the AI has seen, and you realise the muscle has atrophied.</p><p>For SREs, this manifests in specific ways, like we stop reading raw log streams because the AI summarises them, we stop forming hypotheses during incidents because the AI suggests root causes, we stop building mental models of system architecture because the AI maps dependencies for us, and each of these individually looks like a productivity win. Collectively, they hollow out the very expertise that makes an SRE effective when things go sideways in ways nobody anticipated.</p><p>But there&#8217;s something even more concerning than deskilling, and researchers have started calling it <em>never-skilling</em>. Deskilling means you once had a capability but have since lost it. Never-skilling means you never developed it in the first place. For junior SREs entering the field today in an environment where AI handles most of the incident response workflow, the opportunities to build foundational intuition and muscle through hands-on practice are vanishing.</p><p>The training pipeline itself is broken and not <em>self-healing</em>.</p><p>SREs realise their skills are degrading and lean more on AI to compensate, which further degrades their skills, creating a vicious cycle from which escape is difficult.</p><h3>What Can We Do About It?</h3><p>We are definitely not rejecting AI tooling; we are adopting it and integrating it stronger than ever before, because that&#8217;s the only way forward.</p><p>A few approaches worth considering:</p><p><strong>Deliberate inefficiency.</strong> Just as the FAA mandates manual flying time even when the autopilot is perfectly capable, SRE teams can designate certain incidents, even the ones the AI could handle, as <em>human-practice opportunities</em>. This can be considered as a long-term investment to keep skills fresh, although it might come at the cost of a super-fast solution</p><p><strong>Build for human-in-the-loop, not human-on-the-side.</strong> There&#8217;s a meaningful difference between a system where a human approves an AI&#8217;s recommendation and one where a human actively engages with the problem alongside AI. The former keeps humans in a supervisory role that Bainbridge (the lady who wrote <em>that</em> research paper about 40 years ago) showed leads to vigilance decay, and the latter keeps them cognitively engaged.</p><p>Let&#8217;s zoom out and take a look at the bigger picture.</p><h3>The Bigger Picture</h3><p>Everything we&#8217;ve discussed here, the ironies of automation, the deskilling risk, the never-skilling problem, collectively applies well beyond SRE. Software engineering as a whole is navigating the same tension. As AI writes more code, reviews more PRs, and handles more debugging, the same questions apply.</p><p>We&#8217;re talking about SREs specifically because that&#8217;s the world we live in at <strong><a href="https://signoz.io/">SigNoz</a></strong>. We build an open-source observability platform, the kind of tool that gives SREs the metrics, traces, and logs they need to understand their systems deeply. For us, this deskilling question is not a rhetorical fad; it directly shapes how we&#8217;re building AI into our product.</p><p>Our approach is to start with an AI assistant that helps SREs leverage the power of LLMs while keeping humans firmly in control. Eventually, we&#8217;ll enable more autonomy but within clear guardrails, and only as trust is earned.</p><p>One advantage we have in this space is that, as an observability platform, we sit on the data itself, the metrics, traces, and logs that SREs rely on. Most AI SRE products today integrate with observability tools through APIs, which means they&#8217;re working with a limited, second-hand view of your systems. Because we own the data layer, we can build much deeper, more context-aware AI capabilities that understand your system the way an experienced SRE would.</p><p>And to answer the burning question in your head, our goal isn&#8217;t AI that replaces SREs. It&#8217;s AI that supercharges SREs. Unlike ongoing lore, we believe humans will remain essential for the decisions that matter most, especially those that impact production infrastructure.</p><p>The future of SRE is human <em>with</em> AI intentionally designed to keep humans sharp, engaged, and ready for the 5% that really counts.</p><p></p><div><hr></div><p>Here&#8217;s the<strong> <a href="https://www.linkedin.com/posts/pranay01_something-ive-been-thinking-about-lately-activity-7428804029134225408-IXif?utm_source=social_share_send&amp;utm_medium=member_desktop_web&amp;rcm=ACoAAC3MkwYBDZBMgATtR9hOGisjheK_u1VDu6w">LinkedIn post</a></strong> our founder posted a few days ago, which inspired me to write this.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe if you wish to read more hot takes. We are cooking some great ones!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Saving Money with Sampling Strategies Beyond Head and Tail-based Sampling]]></title><description><![CDATA[I decided to go down the rabbit hole to find the strategies that don&#8217;t get the spotlight and make this edition about the lesser-known types of sampling.]]></description><link>https://newsletter.signoz.io/p/saving-money-with-sampling-strategies</link><guid isPermaLink="false">https://newsletter.signoz.io/p/saving-money-with-sampling-strategies</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Tue, 17 Feb 2026 14:03:12 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!2VSp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p></blockquote><p></p><p>When I first encountered sampling about a year ago, I knew only about head- and tail-based sampling. Mainly because most mainstream documentation covered primarily about them.</p><p>But recently, I realised I&#8217;d only been looking at the tip of the iceberg.</p><p>I stumbled upon<strong> <a href="https://www.gouthamve.dev/sampling-at-scale-with-opentelemetry">an article</a> </strong>that discussed sampling in greater depth. I decided to go down the rabbit hole to find the strategies that don&#8217;t get the spotlight and make this edition about the lesser-known types of sampling.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2VSp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2VSp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 424w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 848w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1272w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp" width="725" height="484.50080515297907" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:415,&quot;width&quot;:621,&quot;resizeWidth&quot;:725,&quot;bytes&quot;:72938,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/webp&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/188254385?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2VSp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 424w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 848w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1272w, https://substackcdn.com/image/fetch/$s_!2VSp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F25af45ab-fa05-427e-9209-4d761ebec886_621x415.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Let&#8217;s look at them in greater detail.</p><p></p><h2>#1. Remote Sampling</h2><p>To put it simply, it&#8217;s head-based sampling, but centrally controlled. Each service fetches sampling rules from a central config server. You can specify default and per-endpoint rates in a JSON file, and applications poll for updates periodically. If you are still wondering what the bigger deal is, it is that we can increase or decrease the sampling rate during incidents by changing this file, and within a minute, the applications pick up the new sampling rates. </p><p>That is quite powerful. Despite being battle-tested (used in Uber!), there&#8217;s surprisingly little documentation in OpenTelemetry. Users often struggle to enable Jaeger-style remote sampling with OTel. Some resort to running a Jaeger agent solely to serve the sampling config. OpenTelemetry supports it, but there is very little documentation. Remote sampling lets you keep a low baseline sample rate (say, 1-5%) most of the time and only ramp up to 50-100% when needed, such as during an incident or a debugging session. Because you don&#8217;t need a redeploy, teams are more likely to actually adjust rates to control costs or get details when it matters.</p><p></p><h2>#2. Consistent Reservoir Sampling</h2><p>It&#8217;s essentially head-based sampling that guarantees a fixed sample size. Instead of a simple random percentage, a reservoir sampler maintains a rolling buffer of traces, retaining exactly N traces per time window by using a discrete set of sampling rates and consistency algorithms to ensure fair selection.</p><p>Probabilistic sampling yields a variable number of samples, i.e if traffic doubles, so do your sampled traces and costs. Reservoir sampling always uses a fixed sample size. It&#8217;s statistically representative because the algorithm rotates items in the reservoir with uniform probability.</p><p>This strategy essentially puts a hard ceiling on trace ingestion. It&#8217;s ideal for ensuring you don&#8217;t exceed your budget, even during traffic spikes. The trade-off is that during very low-traffic periods, you might underutilise capacity, but <em>usually</em> most teams prefer predictable costs to a few extra traces.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe to read about more interesting ways you can reduce your observability bill!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>#3. Metrics-from-Traces</h2><p>You can sample traces aggressively, for example<em>, only</em> keep 5%<em>,</em> but still extract metrics from 100% of them before they&#8217;re dropped. In practice, this means placing a metrics-generation stage in your telemetry pipeline bef<strong>ore</strong> the sampling stage. OpenTelemetry makes this possible with components such as <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/spanmetricsconnector/README.md">Span Metrics</a> and <a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/connector/servicegraphconnector/README.md">Service Graph</a>.</p><p>If we naively sample traces, we also lose information needed for metrics such as request rates, error counts, and latencies. One solution is to tally up metrics <em>before</em> any sampling decisions.</p><p>In an OTel Collector, we might chain a spanmetrics connector in the pipeline, then a Sampling processor after it. SpanMetrics will emit metrics (RED metrics such as request rate, error count, latency distributions, service call graphs, etc.) for every span that passes through, so you get complete coverage. Then the sampler (head or tail) drops, say, 95% of spans before storage. The result is that our monitoring dashboards and alerts, which rely on metrics, remain 100% correct, while your trace storage volume is only 5% of raw traffic.</p><p></p><h2>#4. Byte-Rate Limiting (Throttle by Data Volume)</h2><p>This refers to sampling based on the size of traces, not just the count. This is an often-overlooked but effective strategy, you set a cap, such as <em>ingesting at most 10 MB of trace data per second.</em> The sampler then makes decisions to stay under that throughput. OpenTelemetry recently added a <code>bytes_limiting</code> policy in the tail-sampling processor for this. You can read more about it <strong><a href="https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/tailsamplingprocessor/README.md">here</a></strong>.</p><p>It uses a token bucket algorithm, which is common for rate limiting, but the tokens represent bytes. The collector actually measures the size of each trace in bytes, using the protobuf serialised size to accurately account for how much data each trace would consume. You configure a sustained bytes-per-second rate and a burst capacity. For example:</p><pre><code><code>policies:
  - name: volume-limit
    type: bytes_limiting
    bytes_limiting:
      bytes_per_second: 10485760  # 10 MB per second
      burst_capacity: 20971520   # allow bursts up to 20 MB

</code></code></pre><p>If a few gigantic traces arrive, the processor will quickly use up the token budget and start dropping subsequent traces until the rate falls back under 10 MB/s. Conversely, if traces are small, more can pass through until the aggregate size hits the limit.</p><p>This becomes extremely useful when trace sizes vary a lot. For instance, one request might normally produce a 50 KB trace, but a worst-case code path might generate a 5 MB trace. A standard sampler working per-trace might keep both equally, but the latter one trace costs as much as 100 smaller ones.</p><p></p><h2>#5. Adaptive Sampling</h2><p>Adaptive sampling adjusts trace sampling rates in real-time based on live traffic patterns or performance signals. The goal here is to keep overall data volume within budget while dynamically increasing sampling during anomalous events. For instance, you might normally sample only a small percentage of requests, but automatically raise the sample rate when latency or error rates spike beyond an SLO threshold. One strategy is throughput-based adaptation; setting an upper limit on traces per second and letting the system tune the probability to meet that cap. Another is key-based dynamic sampling, where the collector samples frequent events less and rare events more.</p><p>Here&#8217;s an interesting <a href="https://github.com/open-telemetry/opentelemetry-specification/issues/691">GitHub thread</a>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!2Asp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!2Asp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 424w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 848w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1272w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png" width="1456" height="486" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/e30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:486,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114885,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/188254385?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!2Asp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 424w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 848w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1272w, https://substackcdn.com/image/fetch/$s_!2Asp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fe30375fc-1add-40a6-bec3-5565e0b4a3ad_1922x642.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>Adaptive schemes keep observability costs predictable by avoiding oversampling during high-traffic periods, yet they can temporarily boost fidelity when something goes wrong.</p><blockquote><p><em>Care must be taken to ensure coordination across distributed services so that increasing sampling doesn&#8217;t overload the system or skew the data.</em></p></blockquote><p>In my opinion, the shift from conventional probabilistic sampling to the methods above reflects a change in how we view observability. Ultimately, the <em>right</em> sampling strategy aligns your visibility needs with your infrastructure budget, and as OpenTelemetry matures, it will likely become the new standard for any team operating at scale.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to learn more about different interesting ways to save your observability costs!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[How to Reduce Telemetry Volume by 40% Smartly (Java)]]></title><description><![CDATA[But with great power comes great responsibility.]]></description><link>https://newsletter.signoz.io/p/is-your-opentelemetry-auto-instrumented</link><guid isPermaLink="false">https://newsletter.signoz.io/p/is-your-opentelemetry-auto-instrumented</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Wed, 04 Feb 2026 14:02:25 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!oEXb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This post took 5 days, 11 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p></p><p></p><p>OpenTelemetry has become the <em>de facto</em> choice for many organisations&#8217; observability needs today. And with it, auto-instrumentation has turned out to be a powerful means to implement the same.</p><p>But with great power comes great responsibility.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oEXb!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oEXb!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 424w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 848w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1272w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png" width="1456" height="973" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:973,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1419403,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!oEXb!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 424w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 848w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1272w, https://substackcdn.com/image/fetch/$s_!oEXb!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb1d047bc-785c-4596-ac0a-365fe700f3d1_6233x4167.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>While auto-instrumentation provides a strong baseline, its out-of-the-box (or magical?) nature often produces a telemetry surplus and is a double-edged sword. Because auto-instrumentation is designed to be comprehensive by default, it captures <em>everything</em> in case you need it. Without intentional refinement, this can dilute your signal-to-noise ratio, leading to the generation of <em>surplus telemetry</em> that can inflate storage costs while burying actionable insights under a heap of low-value signals.</p><p>While certain types of <em>telemetry surplus</em> are tied to specific libraries, such as HTTP or gRPC, most <em>telemetry waste</em> is a byproduct of the language runtime itself. To illustrate this, we will use Java in the context of the blog. That said, the lessons presented here aren&#8217;t isolated; the signals we&#8217;ll uncover are often common patterns across the broader landscape of modern frameworks.</p><p>This blog is an attempt to help you sieve out the diamonds (good telemetry) from the rocks (noisy telemetry)!</p><h2>Java Agent for Auto-instrumentation</h2><p>By simply attaching a Java agent at runtime, developers can capture traces, metrics, and logs without modifying a single line of application code. The Java agent runs in the same Java Virtual Machine (JVM) as the application, using bytecode manipulation libraries such as ByteBuddy to rewrite classes as they are loaded.</p><p>The Java agent automatically hooks into common frameworks such as Spring Boot, Tomcat, and JDBC drivers to inject span creation and context propagation logic. While effective, this process, as mentioned before, can result in the generation of <em>not-so-useful</em> telemetry data that can later bog down storage and cause issues. <br><br>Let&#8217;s discuss them in greater detail.</p><h2>The Defaults You Should Know About (and Might Want to Disable)</h2><p>I&#8217;ve curated a list of commonly seen (and publicly complained of) not-so-useful telemetry data, referred to as <em>telemetry surplus</em>. Let me introduce them one by one.</p><h3></h3><h3>#1. URL Path and target attributes</h3><p>&#8212; <em>not specific to Java</em></p><p>Another commonly missed issue is that auto-instrumentation for HTTP clients and servers often captures the full <code>http.url</code> or <code>http.target</code> attribute. If an application uses RESTful paths with unique IDs like <code>/api/users/12345</code>, every unique ID creates a new attribute value.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!UyyO!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!UyyO!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 424w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 848w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1272w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png" width="1456" height="395" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:395,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:109434,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!UyyO!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 424w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 848w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1272w, https://substackcdn.com/image/fetch/$s_!UyyO!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0eb615ee-33f5-4d89-8050-b0a8bbbf88ce_1704x462.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This can be considered a waste because it prevents effective aggregation. Aggregation works by grouping similar data into the same bucket based on shared attributes. Hence, if we use a templated route like <code>/api/users/:id</code>, the system puts every &#8216;Get User&#8217; request into a single bucket thereby letting you accurately calculate the p99 latency for the entire &#8216;Get User&#8217; feature.</p><p>Hence, make a mental note to always use templated <code>http.route</code> rather than the raw path, which can result in millions of useless data points, aka wasteful telemetry.</p><h3>#2. Controller spans</h3><p>In frameworks like Spring MVC, auto-instrumentation by default creates multiple spans for a single web request. Some types of spans are,</p><ul><li><p>Server Span<strong> (</strong><code>SpanKind.Server</code><strong>):</strong> The parent span. It tracks the entire process, from when the request reaches your server to when the user receives a response.</p></li><li><p>Controller Span<strong> (</strong><code>SpanKind.Internal</code><strong>):</strong> A child span. It tracks only the time spent inside your <code>@Controller</code> method.</p></li><li><p>View Span<strong> (</strong><code>SpanKind.Internal</code><strong>):</strong> Another child span. It tracks how long it took to turn your data into a JavaServer Page (JSP).</p></li></ul><p>The obvious catch is that in modern micro-services, controllers are often very thin, and they just immediately call a Service or a Database. If your database call is already being tracked, having a separate span that says the c<em>ontroller took 2ms</em> adds very little value. That is, for most cases, you might not need spans that capture controller and/or view execution.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!NtUT!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!NtUT!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 424w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 848w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1272w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png" width="1456" height="780" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b493ec24-14eb-4cb1-8961-225784067e42_1687x904.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:780,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:502571,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!NtUT!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 424w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 848w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1272w, https://substackcdn.com/image/fetch/$s_!NtUT!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb493ec24-14eb-4cb1-8961-225784067e42_1687x904.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>The good news is that you can effectively suppress the generation of these spans by using <em>experimental flags</em>. Here are some flags that let you achieve the desired effect, as mentioned in <strong><a href="https://opentelemetry.io/docs/zero-code/java/agent/disable/#suppressing-controller-andor-view-spans">OpenTelemetry documentation</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!hUoc!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!hUoc!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 424w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 848w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1272w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png" width="1370" height="372" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a012c843-25b1-4408-83a3-be7c6446398e_1370x372.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:372,&quot;width&quot;:1370,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:90040,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!hUoc!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 424w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 848w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1272w, https://substackcdn.com/image/fetch/$s_!hUoc!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa012c843-25b1-4408-83a3-be7c6446398e_1370x372.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3>#3. Thread name in run-time telemetry</h3><p>A major source of high-cardinality data occurs in Java runtime metrics, like <code>jvm.network.io</code> or <code>jvm.memory.allocation</code>. Versions 2.10.0, 2.11.0, and 2.13.1 of the agent included the <code>thread.name</code> attribute by default in these metrics. In environments that use large thread pools or virtual threads, this creates an unbounded number of unique time series, potentially leading to a <strong><a href="https://www.reddit.com/r/sre/comments/1k4h2wi/cardinality_explosion_explained/?utm_source=share&amp;utm_medium=web3x&amp;utm_name=web3xcss&amp;utm_term=1&amp;utm_content=share_button">cardinality explosion</a></strong>.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!KxVp!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!KxVp!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 424w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 848w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1272w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png" width="970" height="468" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:468,&quot;width&quot;:970,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:71582,&quot;alt&quot;:&quot;&quot;,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" title="" srcset="https://substackcdn.com/image/fetch/$s_!KxVp!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 424w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 848w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1272w, https://substackcdn.com/image/fetch/$s_!KxVp!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F06609a9c-91f3-4ae9-be63-3f955ecdc9b0_970x468.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This issue was later corrected; maintainers removed the attribute from default metrics starting with version 2.18.0 (via <a href="https://github.com/open-telemetry/opentelemetry-java-instrumentation/issues/13407">PR #14061</a>). So, if you are using a previous version make sure you have set proper guardrails or bump up to a later version.</p><h3>#4. Duplicate Library Instrumentation</h3><p>This is an interesting dilemma.</p><p>Let&#8217;s first dissect the problem at hand. So, usually Java agents automatically attach to every supported library they find in our class path (of the application/ project) and end up instrumenting multiple layers of the same outgoing request.</p><p>Let me break this down with an example.</p><p>In modern Java development, we rarely use a low-level library directly. Instead, we use high-level SDKs. For example:</p><ol><li><p>Application Code calls the AWS SDK to upload a file to S3.</p></li><li><p>The AWS SDK (high-level) uses Apache HttpClient (mid-level) to execute the request.</p></li><li><p>Apache HttpClient uses Java Networking (low-level) to send bytes over the wire.</p></li></ol><p>Now, the Java Agent would see all three layers and create three separate spans for the same single logical operation. This results in nested spans that describe the same work, effectively doubling or tripling the telemetry volume for every outbound call.</p><p>To prevent this, the OpenTelemetry Java Agent suggests using a Span Suppression Strategy. This logic detects when an instrumentation point is already wrapped by another instrumentation point higher up the call stack.</p><p>The behaviour is controlled by the following property: <code>otel.instrumentation.experimental.span-suppression-strategy</code></p><p>There are three primary strategies used to decide which spans to keep and which to discard. You can read more about that <strong><a href="https://opentelemetry.io/docs/zero-code/java/agent/disable/#instrumentation-span-suppression-behavior">here</a></strong>.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for more resources that help you save observability costs!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h3>#5. Resource attributes</h3><p>Auto-instrumentation detectors for Kubernetes and host metrics often capture dynamic, unique identifiers by default, such as <code>container.id</code>, <code>k8s.pod.uid</code>, or <code>process.pid</code>. When these are attached to metrics (specifically), they create a new time series for every single container restart or process launch. This tampers with aggregation, and the metrics database is flooded with thousands of dead time series, increasing storage costs and significantly slowing down query performance for long-term trends, adding to telemetry surplus.</p><h3>#6. JDBC and Kafka Internal Signals</h3><p>Certain auto-instrumentation modules are inherently chatty, generating high-frequency spans for internal mechanics that carry little diagnostic value.</p><p>For example, the jdbc-datasource module (now often disabled by default) creates a span every time a connection is retrieved from a pool via <code>getConnection()</code>, resulting in thousands of entries that merely confirm the pool is functional.</p><p>Similarly, Kafka instrumentation can produce excessive spans for background heartbeats and metadata checks.</p><p>To mitigate this noise, these specific modules can be disabled upstream by setting <code>-Dotel.instrumentation.jdbc-datasource.enabled=false</code> or <code>-Dotel.instrumentation.kafka.enabled=false</code>, OR filtered downstream in the Collector to drop specific span names like poll or heartbeat, depending on the greater architecture of your application.</p><h3>#7. Scheduler and Periodic Jobs</h3><p>&#8212; <em>can be broadly applied to schedulers and jobs in different languages</em></p><p>Applications using Spring Scheduling or Quartz for background tasks like polling a database or checking a cache every second generate a span for every single execution. If a job runs once per second but does nothing interesting 99% of the time, it creates 86,400 successful but meaningless spans per day. This qualifies as telemetry waste in most cases.</p><p>You can disable the generation of these scheduler spans by using the system properties listed below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ppXe!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ppXe!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 424w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 848w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1272w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png" width="1328" height="388" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:388,&quot;width&quot;:1328,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:54974,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ppXe!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 424w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 848w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1272w, https://substackcdn.com/image/fetch/$s_!ppXe!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F823ca0fa-a73f-41c0-b9b0-fd4d62a5e878_1328x388.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h3><strong>#8. SDK Misalignment</strong></h3><p>Another massive source of enterprise surplus occurs when a framework like <strong><a href="https://trino.io/">Trino</a> </strong>initialises its own internal OpenTelemetry SDK instance instead of joining the global instance provided by the Java agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JRVs!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JRVs!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 424w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 848w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1272w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png" width="984" height="499" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:499,&quot;width&quot;:984,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:84395,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/186390928?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JRVs!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 424w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 848w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1272w, https://substackcdn.com/image/fetch/$s_!JRVs!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6f2db5a-b9c7-4ff4-8e35-fe279085cf0b_984x499.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>This results in two parallel telemetry pipelines running in one JVM, doubling memory overhead and network traffic. Because the instances are separate, the valuable business spans from the framework, which often misses the agent&#8217;s auto-detected resource attributes like Kubernetes namespace, making the data invisible to standard production queries and hence becomes telemetry surplus.</p><h2>Mitigation Strategies &#128658;</h2><p>Now that we have seen several ways your application could generate telemetry, this section provides a broad overview of how you can mitigate the resulting waste. As they say, prevention is better than cure; generating less telemetry surplus is the best way to eliminate it, but in most cases, it&#8217;s almost inevitable, and it&#8217;s important to learn how to mitigate it.</p><p>Mitigating telemetry waste requires a smart combination of upstream prevention and downstream pruning. For upstream, the most effective defence is selective enablement. By disabling the default <em>capture everything</em> behaviour and re-enabling only critical modules, while specifically suppressing known chatty modules or experimental controller spans as mentioned in the sections above. Downstream, where the telemetry meets the collector, it serves as a powerful filter using the processor  to delete redundant resource keys and employing tail sampling to keep 100% of error traces, while sampling only a tiny fraction of successful, low-signal traffic can reduce data volume without sacrificing diagnostic efficacy.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! We have more amazing content planned, with tips to manage your OTel systems better!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[6 Things I Learned About OpenTelemetry Contribution (That the Docs Won't Tell You)]]></title><description><![CDATA[If you&#8217;ve been waiting for a sign to start or restart contributing to OTel, this is it! &#128150; &#10024;]]></description><link>https://newsletter.signoz.io/p/6-things-i-learned-about-opentelemetry</link><guid isPermaLink="false">https://newsletter.signoz.io/p/6-things-i-learned-about-opentelemetry</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Tue, 20 Jan 2026 13:31:58 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!G3LG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This blog took 6 days and 7 hours to be curated, so make sure to show some love!</em></p></blockquote><p></p><p>Contributing to open-source can be overwhelming at first, and it&#8217;s okay to feel a little lost when trying to navigate your way through it. OpenTelemetry is one such open-source project under the CNCF [the second-largest, to be precise].</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!G3LG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!G3LG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 424w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 848w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1272w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png" width="1443" height="961" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:961,&quot;width&quot;:1443,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1986031,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!G3LG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 424w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 848w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1272w, https://substackcdn.com/image/fetch/$s_!G3LG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F95218c9d-dc08-4fad-988b-d2317943b607_1443x961.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>With over 200 contributors, it&#8217;s doing really well and is growing fast. I&#8217;ve been part of the community [and advocating its adoption!] for a while and see many people who wish to contribute to the project asking for tips, guidance, and direction in the Slack channels. There isn&#8217;t a lack of resources in this aspect, but it could be a bit scattered across a dozen different repos and docs. This blog is an attempt to bring all the resources you need to get started in one place, in a capsule.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0Ita!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0Ita!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png" width="847" height="179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5b90162-128c-4cfb-b824-d49d817046ef_847x179.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:179,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!0Ita!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!0Ita!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5b90162-128c-4cfb-b824-d49d817046ef_847x179.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Yi8U!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 424w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 848w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1272w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png" width="909" height="144" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:144,&quot;width&quot;:909,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:30410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Yi8U!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 424w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 848w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1272w, https://substackcdn.com/image/fetch/$s_!Yi8U!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F7c3a88cb-d8f7-4410-90ee-bdc8f2e2b051_909x144.png 1456w" sizes="100vw"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!JPnX!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!JPnX!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png" width="847" height="179" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:179,&quot;width&quot;:847,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:42887,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!JPnX!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 424w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 848w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1272w, https://substackcdn.com/image/fetch/$s_!JPnX!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9ed24d5c-adf7-41b0-94fa-b2a2d62de0c7_847x179.png 1456w" sizes="100vw"></picture><div></div></div></a><figcaption class="image-caption">Some snippets of folks introducing themselves</figcaption></figure></div><p>I&#8217;ve been following <strong><a href="https://www.linkedin.com/in/diana-todea-b2a79968/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BcmDxyM3CSomEXgWtZHCWRg%3D%3D">Diana</a></strong> <strong><a href="https://www.linkedin.com/in/diana-todea-b2a79968/?lipi=urn%3Ali%3Apage%3Ad_flagship3_feed%3BcmDxyM3CSomEXgWtZHCWRg%3D%3D">Todea&#8217;s</a></strong> journey very closely for a while, and recently she won the OpenTelemetry Community Awards at KubeCon, NA 2025. So my obvious next step was to hop on a call with her and gather as many insights as I could! Between our conversation and her own <strong><a href="https://medium.com/@dianatodea/the-unofficial-guide-to-contributing-to-opentelemetry-where-to-look-and-who-to-talk-to-9de04ae75fe0">recent writings</a></strong><a href="https://medium.com/@dianatodea/the-unofficial-guide-to-contributing-to-opentelemetry-where-to-look-and-who-to-talk-to-9de04ae75fe0">,</a> I&#8217;ve distilled the best insights on how you can move from a lurker to a contributor.</p><p>I am also addressing a problem here: While many folks want to contribute, there is a shortage of folks who actually make it to their first PR, and even fewer who consistently continue to contribute and stay active. I&#8217;m writing this to address both hurdles, helping you get started and find a reason to stay.</p><p>So, if you&#8217;ve been waiting for a sign to start or restart, this is it. &#128150; &#10024;</p><p></p><h2>#1. What&#8217;s the first step I should take?</h2><p>Kudos to you for taking the first leap. You can start by joining the <strong><a href="https://cloud-native.slack.com/ssb/redirect">CNCF Slack</a> </strong>channel [of which OTel is a part] and come say hi in the #hallway channel. Here are some examples.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!fPc-!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!fPc-!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 424w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 848w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1272w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png" width="925" height="121" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0d688412-41d4-4402-a279-d6b360c81b39_925x121.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:121,&quot;width&quot;:925,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:19732,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!fPc-!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 424w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 848w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1272w, https://substackcdn.com/image/fetch/$s_!fPc-!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0d688412-41d4-4402-a279-d6b360c81b39_925x121.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ueod!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ueod!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 424w, https://substackcdn.com/image/fetch/$s_!ueod!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 848w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1272w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png" width="927" height="118" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c608fc69-ab56-448c-9ad1-1def44918e11_927x118.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:118,&quot;width&quot;:927,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24066,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ueod!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 424w, https://substackcdn.com/image/fetch/$s_!ueod!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 848w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1272w, https://substackcdn.com/image/fetch/$s_!ueod!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc608fc69-ab56-448c-9ad1-1def44918e11_927x118.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Snippets of folks saying hi in #hallway</figcaption></figure></div><p>You can also follow suit and introduce yourself. Next, since contribution is the primary goal, here&#8217;s the <strong><a href="https://opentelemetry.io/docs/contributing/">official documentation</a></strong> outlining key aspects you should know. The next step you could take is try finding a good first issue from this <strong><a href="https://github.com/open-telemetry/opentelemetry.io/issues?q=is:issue+is%3Aopen&amp;%2343;sort%3Aupdated-desc&amp;%2343;label%3A%22good+first+issue%22">list</a></strong>.</p><p></p><blockquote><p><em>You can also check out <strong><a href="https://clotributor.dev/?source=post_page-----9de04ae75fe0---------------------------------------">CLOtributor</a>,</strong> which helps you find good first issues across a number of Cloud Native projects. Here are some channels you can join initially. [as per Diana&#8217;s blog]</em></p><p><em>#otel-sig-end-user, #otel-devex, #opentelemetry-new-contributors, #otel-contributor-experience, #otel-docs-localization</em></p></blockquote><p>But now you could run into your first dilemma. Let&#8217;s see how to get over it.</p><p></p><h2>#2. I can&#8217;t find a good first issue, wtd<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-1" href="#footnote-1" target="_self">1</a> ?</h2><p>Finding a good first issue is indeed a task in its own. Most of them could already have been picked up by someone, and there could be active discussions around them. Because these issues are beginner-friendly, they are highly competitive and often claimed within hours of posting.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WlTA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WlTA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 424w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 848w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1272w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png" width="1456" height="601" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:601,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:261827,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!WlTA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 424w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 848w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1272w, https://substackcdn.com/image/fetch/$s_!WlTA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9c74a1fd-98aa-4d5b-8d32-1d7bee42413f_1986x820.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">See, most good first issues could already be assigned and actively worked on</figcaption></figure></div><p>If this is the case, you can shift your strategy. While these issues are great for a quick win, they rarely help you build the rapport/ relationships or architectural understanding necessary for long-term contribution. This is why the Special Interest Group [SIG] model of OpenTelemetry Community is so important.</p><p>You can always start small, by being an active part of the community, including SIG calls and discussions in the corresponding channels, and by trying to make yourself useful with ad hoc tasks.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to get more content on observability and OpenTelemetry delivered to your inbox.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2>#3. I made a PR, not getting any reviews, wtd?</h2><p>Give it some time.</p><p>Most maintainers have a day job in addition to maintaining the project, so small delays can occur. You can always post a message in the corresponding Slack channel with enough context so that anyone can pick up the review task. Here&#8217;s an example.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Dkoi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 424w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 848w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png" width="1456" height="347" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:347,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:141289,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Dkoi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 424w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 848w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1272w, https://substackcdn.com/image/fetch/$s_!Dkoi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F67c1de59-95c8-4f06-b6dc-1bb7e6677ec8_1946x464.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Ideal way to ask for reviews</figcaption></figure></div><p></p><h2>#4. I want to contribute, but non-technically, wtd?</h2><p>The good news is you are in high demand!</p><p>There&#8217;s a lot of work that is not coding that could use a lot more hands on deck. Some means of contribution include documentation and blogs. You can find ways to improvise existing documentation or add new ones. You can also join the End User Working Group [EUWG]. They are constantly looking for people to share implementation stories, conduct user interviews, or improve feedback loops between vendors and users. The Merge Forward is another initiative that focuses on diversity and inclusion, and needs allies to help run mentorship programs and community events.</p><p>If you&#8217;re the kind of person who likes helping others, you can contribute by simply being active in forums or Slack to answer questions from newer users. Helping troubleshoot issues or explaining concepts in the Slack channels or GitHub discussions is a valuable form of contribution, too. So, by being a friendly helper in the community, you&#8217;re contributing to the project&#8217;s success, and you might build a reputation for yourself along the way.</p><p>If you&#8217;re interested in the process side of things, OpenTelemetry, being a pretty huge project, has many SIG meetings, public notes, and release planning. You could volunteer to help with note-taking in a SIG meeting, or assist in organising community events like the OpenTelemetry Community Day at KubeCon. The <strong><a href="https://www.notion.so/6-2e9fcc6bcd1980dfb2a8cb1902f58745?pvs=21">Contributor Experience SIG</a></strong> focuses on improving the project for contributors; they might have initiatives you can join, even if you&#8217;re not contributing code.</p><p>Another piece of good news is that you can always switch tracks or do both code and non-code contributions. In our call, Diana emphasised that a contributor&#8217;s journey can be very fluid; you might start with documentation because that&#8217;s what you&#8217;re comfortable with, and later move into code as you learn more, or vice versa. The path you choose initially doesn&#8217;t lock you in; all contributions count, and in a project as broad as OpenTelemetry, there is a need for a diverse set of skills, which can be the best launchpad for you [if you utilise them well!].</p><p></p><h2>#5. How to contribute actively and remain consistent?</h2><p>Here&#8217;s something harder than getting your first PR merged. Staying consistent and active in the community. Many, many people drop off after a couple of contributions. Here&#8217;s when consistency and discipline come into the picture, much like hitting the gym &#128517;.</p><p>Consistency in open source comes from aligning your contributions with what genuinely excites you. You have options to choose from, ranging from whether it&#8217;s a SIG you&#8217;re passionate about or a specific skill you want to grow. Set a realistic routine, such as contributing weekly or monthly, and stay connected by attending SIG meetings, tracking GitHub updates, or staying active in Slack.</p><p>You can stay in the loop by attending the bi-weekly SIG meetings for your area, even if just as a listener at first, or by joining community calls.</p><p>As Diana puts it, when something triggers you and helps you learn, it becomes easier to show up consistently and enjoy the journey. And, like everything else in life, motivation is intrinsic and should come from within. &#129496;&#8205;&#9792;&#65039;</p><p></p><h2>#6. Ok, but what do I get out of this?</h2><p>Trick question.</p><p>Contributing to OpenTelemetry or any open source project, for that matter, is indeed an investment of your time and effort. The good news is the ROI<a class="footnote-anchor" data-component-name="FootnoteAnchorToDOM" id="footnote-anchor-2" href="#footnote-2" target="_self">2</a> can be huge, both personally and professionally. </p><p>Today, OpenTelemetry sits at the forefront of observability. By contributing, you&#8217;ll gain a much deeper understanding of how instrumentation, tracing, metrics, and related technologies work under the hood. As you debug issues or implement features, you&#8217;ll inevitably learn a ton about distributed systems, telemetry data, and best practices in cloud-native architectures from the industry&#8217;s best people! You&#8217;ll also be interacting with engineers from many companies [since OpenTelemetry has contributors from Lightstep, Google, SigNoz, and dozens of organisations]. These connections can lead to job opportunities or collaborations in the future. Many contributors find that being active in open source eventually opens multiple doors.</p><p>Many people also contribute out of a passion for the technology and the ethos of open source; if you&#8217;ve benefited from free software, there&#8217;s a gratifying element of paying it forward. That motivation can be very fulfilling in itself.</p><p></p><h2>Some areas that could use more help</h2><p>OpenTelemetry is a broad project with many moving parts, and naturally, some parts of it have more active contributors than others. If you&#8217;re looking to make a real impact and perhaps have an easier time finding issues to tackle, it helps to know which areas are currently under-resourced. Based on community insights and what maintainers have pointed out, here are a few areas in need of more contributors:</p><ul><li><p><strong><a href="https://opentelemetry.io/docs/contributing/localization/">Documentation Localisation</a></strong>: As Diana mentioned, translating docs is a major need. Some language communities, like Japanese and Chinese, have been very active in translating OpenTelemetry docs, but others have barely started. If you are fluent in any language besides English, you can make a big difference by contributing to localisation efforts.</p></li><li><p><strong>Language SDKs with smaller teams:</strong> OpenTelemetry maintains SDKs for many languages. Some of these, especially the most popular languages, have large contributor teams, but others could use help. For example, newer or less common language implementations might have only a couple of maintainers. If you happen to know a language like PHP, Ruby, Erlang, or Rust, those SDKs might appreciate extra contributors to help fix bugs and implement new features to catch up with the latest spec.</p></li><li><p><strong>eBPF Instrumentation [OBI]:</strong> One of the newer frontiers in OpenTelemetry is the eBPF auto-instrumentation a.k.a. OBI. This allows automatic telemetry data capture at the kernel level without modifying application code. If you&#8217;re interested in low-level programming or Linux kernel tech, the OBI project would love some help!</p></li></ul><p>Being part of the community and taking on responsibilities is as simple as sending an intro message to any SIG or channel you&#8217;re particularly interested in and asking if you can help out with anything. It can be as easy as the screenshot below! Thanks to the amazing community made even more welcoming by the great people in it!</p><p>So go, and make a change!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RMzh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RMzh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 424w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 848w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1272w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png" width="451" height="391" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ac39a019-c300-45d0-a11d-10021151ffda_451x391.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:391,&quot;width&quot;:451,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:80933,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/185160389?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RMzh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 424w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 848w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1272w, https://substackcdn.com/image/fetch/$s_!RMzh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fac39a019-c300-45d0-a11d-10021151ffda_451x391.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">How I started!</figcaption></figure></div><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you want more great insights on observability and beyond, hit subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><p></p><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-1" href="#footnote-anchor-1" class="footnote-number" contenteditable="false" target="_self">1</a><div class="footnote-content"><p>wtd: just an acronym for &#8216;what to do?&#8217; much like this emoji &#129335;&#8205;&#9792;&#65039;</p></div></div><div class="footnote" data-component-name="FootnoteToDOM"><a id="footnote-2" href="#footnote-anchor-2" class="footnote-number" contenteditable="false" target="_self">2</a><div class="footnote-content"><p>ROI: Return On Investment</p></div></div>]]></content:encoded></item><item><title><![CDATA[BTS of OpenTelemetry Auto-instrumentation]]></title><description><![CDATA[OpenTelemetry&#8217;s auto-instrumentation toolkit boils down to a couple of clever techniques that make all of this possible. Let's discuss them!]]></description><link>https://newsletter.signoz.io/p/bts-of-opentelemetry-auto-instrumentation</link><guid isPermaLink="false">https://newsletter.signoz.io/p/bts-of-opentelemetry-auto-instrumentation</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 10 Jan 2026 15:38:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!IKwd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while. <br></em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p></blockquote><p></p><p>I&#8217;ve been an OpenTelemetry advocate for over a year and have written many, many blogs on adopting OpenTelemetry in your systems to achieve deep observability. Yet, I&#8217;ve always wondered how and what actually happens behind the scenes, in the context of auto-instrumentation.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!IKwd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!IKwd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png" width="1456" height="971" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:971,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1322370,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/184125533?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!IKwd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 424w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 848w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1272w, https://substackcdn.com/image/fetch/$s_!IKwd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6e373ed1-e673-4829-b141-2b92a06f938d_3000x2000.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>So, this is me breaking down what happens under the hood of OpenTelemetry for you.</p><h2>Refresher on Auto-instrumentation</h2><p>For those who are new to the space, auto-instrumentation refers to collecting telemetry [traces, metrics, logs] from your application without requiring you to make changes to the application code. You can read more about it from the<strong><a href="https://opentelemetry.io/docs/concepts/instrumentation/zero-code/"> official docs </a></strong>here.</p><p>A helpful way to understand how this works is to separate the OpenTelemetry API from the OpenTelemetry SDK.</p><ul><li><p>The OTel API is the interface for creating telemetry &#8212; &#8220;start a span&#8221;, &#8220;add an event&#8221;, &#8220;record a metric&#8221;, &#8220;propagate context&#8221;, etc. Both manual instrumentation [your code] and auto-instrumentatio<strong>n</strong> [instrumentation libraries/agents] ultimately use these same API calls. But in auto-instrumentation, it&#8217;s taken care of for you automatically.</p></li><li><p>The OTel SDK is the implementation behind the API &#8212; it decides what actually happens to that telemetry [sampling, batching, processing] and where it goes [exporting].</p></li></ul><p>So auto-instrumentation is typically achieved in two parts,</p><ul><li><p>Instrumentation hooks [libraries/agents] that wrap existing functions and call the OTel API at the right points.</p></li><li><p>SDK configuration that ensures those API calls actually record telemetry and can be exported.</p></li></ul><p>In Auto-instrumentation, OpenTelemetry wraps existing function implementations and extracts useful data, such as function parameters, execution duration, and results. It&#8217;s important to note that the way this wrapping and hooking is done varies widely across programming languages. Broadly, we can say that there&#8217;s a clear difference between how it works in dynamic languages [like JavaScript, Python, and Ruby] versus statically-typed or compiled languages [like Java, Go, and .NET].</p><p>Let&#8217;s dive into those differences (or similarities!) next.</p><h2>Dynamic vs. Static Languages</h2><p>It becomes easier to understand what happens behind the scenes when classifying the languages broadly into dynamic and static. Dynamic languages allow instrumentation to patch or wrap functions at runtime easily, whereas static languages, on the other hand, don&#8217;t natively allow such runtime patching, so they require different techniques to insert instrumentation code. That is, most dynamic languages like Python, JavaScript, and Ruby, which are more flexible at run-time, depend on methods like monkey-patching to implement auto-instrumentation. While other static languages or those that run on virtual machines like Go or C rely on techniques like build-time injection for the same.</p><h2>Some Cool Techniques</h2><p>OpenTelemetry&#8217;s auto-instrumentation toolkit boils down to a couple of clever techniques that make all of this possible. Let&#8217;s discuss two of the most common methods used under the hood.</p><p></p><h3>Monkey Patching</h3><p>The lore behind the term <em>monkey-patching</em> fascinated me. Apparently, the word&#8217;s etymology comes from <em>guerrilla-patching</em>, which refers to the sneaky act of changing code at runtime to fix a bug or add a feature without altering the original source code. Because <em>guerrilla</em> and <em>gorilla</em> are near-homophones, the term was intentionally used as a pun, <em>gorilla-patch</em>. Eventually, developers who wrote their patches more carefully began calling them <em>monkey-patches</em> to make the process sound less intimidating than a <em>gorilla</em>.</p><p>Okay, now let&#8217;s get back to the engineering. In dynamic languages such as Python and Node.js, functions and modules are treated as first-class objects that reside in mutable memory structures. This allows OpenTelemetry to employ monkey patching, a technique where existing functions are replaced with instrumented wrappers at runtime.</p><p>The concept is straightforward, at runtime, we replace existing functions with instrumented versions that inject telemetry before and after calling the original function.</p><p>This piece of code roughly illustrates what happens in Node.js.</p><pre><code><code>const originalFunction = exports.functionName;

function instrumentedFunction(...args) {
  const startTime = process.hrtime.bigint();
  // invoke the OG function here
  const result = originalFunction.apply(this, args);
  const duration = process.hrtime.bigint() - startTime;
  console.log(`functionName(${args[0]}) took ${duration} nanoseconds`);
  return result;
}

exports.functionName = instrumentedFunction;
</code></code></pre><p>OTel JavaScript uses a package called <code>require-in-the-middle</code> to intercept module loading and apply such patches before your code runs.</p><p>Let&#8217;s see how this could work in Python. Say we are trying to collect data from an HTTP client, like requests. Python&#8217;s requests lib, exposes a separate function for each HTTP method [<code>requests.get</code> / <code>requests.post</code> / <code>requests.put</code>, and so on]. But each of these functions eventually calls an internal request method, whose parameters are the method, the URL, and all the kwargs. The function then returns a response object.</p><p>Let&#8217;s see what this looks like pseudo-code-wise:</p><pre><code><code>def request(method, url, **kwargs):
&#9;# Original implementation

def wrapped_request(method, url, **kwargs):
&#9;before = datetime.now()
&#9;# Call the original implementation
&#9;response = request(method, url, **kwargs)
&#9;# Collect the necessary information
&#9;duration = datetime.now() - before
&#9;collect_data(method, url, response.status_code, duration)
&#9;# Return the value from the original call
&#9;return response

</code></code></pre><p>To close the loop, the original function implementation needs to be replaced with the new <code>wrapped_request</code>. For dynamic languages, this is done by simply holding a reference to the original implementation and replacing the function with its name. A pseudocode implementation [which isn&#8217;t very, very far from a real life code] looks like this:</p><pre><code><code>original_request_impl = requests.request

def wrapped_request(method, url, **kwargs):
&#9;# Wrapped implementation as appears, has the original call
&#9;# As shown in the previous snippet

requests.request = wrapped_request

</code></code></pre><p>Calling these requests won&#8217;t result in any observable change, albeit the auto-instrumentation will keep collecting necessary data.</p><h3>Byte-code Instrumentation</h3><p>This is the underlying technique for languages that run on a virtual machine. Instead of modifying functions at the language level, this approach modifies the compiled code [bytecode] as it&#8217;s being loaded into the runtime. Essentially, the instrumentation injects extra bytecode instructions that call OpenTelemetry APIs around the target method&#8217;s original instructions.</p><p>In the <s>Jurassic</s> Java world, this is done via a special agent. When you run a Java app with the <code>-javaagent </code>flag pointing to the OpenTelemetry Java Agent JAR, the JVM invokes the agent&#8217;s <code>premain()</code> method before anything else.</p><pre><code><code>public static void premain(String args, Instrumentation inst) {
    new AgentBuilder.Default()
        .type(ElementMatchers.nameStartsWith("com.example.TargetApp"))
        .transform((builder, typeDescription, classLoader, module, protectionDomain) -&gt;
            builder.method(ElementMatchers.named("targetMethod"))
                   .intercept(MethodDelegation.to(MethodInterceptor.class))
        ).installOn(inst);
}
</code></code></pre><p>In that <code>premain()</code>, OTel registers a class transformer [as seen in the snippet] with the JVM. As each class loads, the transformer can inspect it and, if it matches one of the known libraries or functions we want to instrument [e.g., a Servlet filter, a JDBC call, etc.], the agent will modify the class&#8217;s bytecode on the fly to insert the telemetry hooks. The end result is that by the time your application&#8217;s code runs those functions, they already have tracing logic woven in.</p><p>Bytecode instrumentation is extremely powerful because it works at the Java virtual machine [JVM] level, making it language-agnostic within the JVM ecosystem. It can instrument Java, Kotlin, Scala, and other JVM languages without any modification.</p><p>The trade-off is a bit more complexity and setup &#8212; you need to run the app with the agent [or enable the profiler], and there is some startup overhead to transform classes. Once running, the performance impact of the injected code is usually minimal. Overall, this technique lets OpenTelemetry achieve deep, broad instrumentation of popular frameworks in Java and .NET with near-zero friction for the developer.</p><h3>Abstract Syntax Tree Modification</h3><p>Unlike Python, which is a dynamic language and Java, which is a kind of static language that runs in the VM, Go is a static language that does not use a VM, making it an outlier in this case. In Go, auto-instrumentation works by modifying the Abstract Syntax Trees [ASTs].</p><p>It was in the Compiler Design Course of my undergrad degree when I first got introduced to ASTs. It&#8217;s primarily a data structure widely used in compilers to represent program code. An AST is usually the result of the syntax analysis phase of a compiler. This is exactly where the auto-instrumentation comes into the picture as well.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rPDn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rPDn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 424w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 848w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1272w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png" width="451" height="315" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:315,&quot;width&quot;:451,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:36186,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/184125533?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!rPDn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 424w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 848w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1272w, https://substackcdn.com/image/fetch/$s_!rPDn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ad95c3e-7668-419c-9e89-05818edf2012_451x315.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Example of an abstract syntax tree (a very, very small one)</figcaption></figure></div><p></p><p>The auto-instrumentation process of Go involves parsing the source code into an AST, adding instrumentation code to the tree, and generating the modified source code before compilation. This approach ensures that the instrumentation is incorporated in the final binary, providing zero runtime overhead for the instrumentation mechanism itself. But it does come with trade-offs, including the need for access to source code, which makes it difficult to instrument third-party libraries and plugins, and the need for complex changes to build pipelines.</p><h2>Final Words</h2><p>Delving into how OpenTelemetry auto-instrumentation works behind the scenes reveals a lot of clever engineering. The mechanisms that we learnt above allow OTel to hook into your application&#8217;s execution, gather context and timing information, and funnel it into the OTel SDK, all without you changing your application code. &#128522;</p><p>As an OpenTelemetry user, you don&#8217;t usually need to worry about these details, but understanding them can be helpful when you are instrumenting</p><p>In the end, what feels like telemetry appearing out of thin air, aka auto-instrumentation, is actually the result of these well-orchestrated techniques. Knowing this, you can better appreciate the work done by the OTel community and troubleshoot issues with a deeper intuition.</p><p>Happy instrumenting!</p><p></p><blockquote><p><em>On another note, SigNoz along with InKeep is <strong><a href="https://luma.com/f2t9hnia">hosting a webinar</a></strong> on Debugging AI Agents: Observability Best Practices with Inkeep &amp; SigNoz. Check out if it is something that interests you!</em></p><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe to stay tuned for more observability related content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Reducing OpenTelemetry Bundle Size in Browser Frontend]]></title><description><![CDATA[But here&#8217;s the thing, neglecting observability for reducing bundle size isn&#8217;t a good trade-off.]]></description><link>https://newsletter.signoz.io/p/reducing-opentelemetry-bundle-size</link><guid isPermaLink="false">https://newsletter.signoz.io/p/reducing-opentelemetry-bundle-size</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 20 Dec 2025 13:45:23 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!xq4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>In honour of the Stranger Things finale, I&#8217;ve hidden a few easter eggs throughout this edition for all the fellow fans out there. See if you can spot them all! As this is our final edition of the year, I want to wish all of my readers a very happy holiday season, a joyous Christmas, and a wonderful new year.</em></p><p><em>Cheers.</em></p></blockquote><p></p><p>When I was building applications, I used to always rely on the DevTools console of my web browser to examine logs in the frontend. But, with UI log messages only being accessible within your browser rather than forwarded to a file somewhere, which is the common pattern with backend services, losing visibility of this resource when triaging user issues was a real dilemma. Since adding any kind of monitoring/ observability solution would blow up the bundle size, I&#8217;d try to avoid it as much as possible.</p><p>But here&#8217;s the thing, neglecting observability for reducing bundle size isn&#8217;t a good trade-off. There are several other ways for you to run up that hill, and meanwhile, if you are caught in a scenario where your requests are not being sent, and the site is crashing and everything is turning upside down, you&#8217;ll have to inevitably start looking inside.</p><p>Inside your traces, spans and contexts.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!xq4c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!xq4c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 424w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 848w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1272w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png" width="1456" height="1459" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1459,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:4324615,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/182164862?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!xq4c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 424w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 848w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1272w, https://substackcdn.com/image/fetch/$s_!xq4c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F60471610-c9e2-4fc3-8546-eb7d33a4d31c_3075x3081.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><p>In this blog, we explore strategies to trim the bundle impact of OTel, focusing on tree-shaking [removing unused code] and lazy-loading [deferring loading until needed] and how to apply these in different frameworks.</p><p></p><h2>Impact of OpenTelemetry on Bundle Size and Performance</h2><p>Out of the box, adding OpenTelemetry&#8217;s web libraries can introduce quite a significant amount of JavaScript. For example, the official browser auto-instrumentation bundle was about <strong>300 KB uncompressed [~60 KB gzipped]</strong> after recent optimisations, which is in the same ballpark as many third-party RUM [Real User Monitoring] agents. While 60 KB may seem <em>okay-ish</em>, loading and executing this script during initial page load can <strong>delay rendering</strong>. A large script can increase <strong>blocking time</strong>, potentially pushing out LCP [Largest Contentful Paint &#8212; the render of the largest element] beyond the optimal 2.5s threshold.</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!s061!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!s061!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 424w, https://substackcdn.com/image/fetch/$s_!s061!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 848w, https://substackcdn.com/image/fetch/$s_!s061!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1272w, https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png" width="883" height="181" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:181,&quot;width&quot;:883,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:24663,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/182164862?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!s061!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 424w, https://substackcdn.com/image/fetch/$s_!s061!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 848w, https://substackcdn.com/image/fetch/$s_!s061!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1272w, https://substackcdn.com/image/fetch/$s_!s061!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9a880d43-31e2-4052-b62b-b4cc32d7d78f_883x181.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption">Not opting for OTel due to heavy bundle-size</figcaption></figure></div><p>Core Web Vitals are very sensitive to any render-blocking resources. We generally avoid deferring critical content, but telemetry scripts are <em>not</em> user-facing content. In fact, web performance guidelines note you should <em>not</em> lazy-load an LCP image [since that delays visible content]; however, lazy-loading a telemetry script is a good practice precisely because it&#8217;s non-essential to the user&#8217;s immediate experience. The challenge is finding a balance: we want to collect telemetry [traces of page loads, API calls, user interactions, metrics like Web Vitals, etc.] for observability, <em>but</em> we must prevent the OTel code from slowing down the page. We will look at two proven techniques &#8212; tree-shaking and lazy-loading to reduce bundle bloat.</p><p></p><h2>Tree-Shaking &#127795; OpenTelemetry Code</h2><p>Tree-shaking is a build optimisation that removes dead code, including modules or functions that your application doesn&#8217;t actually use. OpenTelemetry&#8217;s JavaScript SDK is modular, which means if you import only certain parts [say, the tracing API and one exporter], you <em>should</em> be able to exclude others [like metrics, logging, or unused instrumentations]. Ensuring that tree-shaking works with OTel involves a few considerations:</p><h3><strong>Use Modern ESM Imports</strong></h3><p>All OTel packages support <a href="https://www.w3schools.com/nodejs/nodejs_modules_esm.asp">ES Modules</a>. Import only the symbols you need, rather than entire libraries. For example, if you only need the web tracer and the OTLP exporter, you might do:</p><pre><code><code>import {WebTracerProvider }from'@opentelemetry/sdk-trace-web';
import {BatchSpanProcessor }from'@opentelemetry/sdk-trace-base';
import {OTLPTraceExporter }from'@opentelemetry/exporter-trace-otlp-http';
</code></code></pre><p>This pulls in only tracing-related code and the OTLP trace exporter, leaving out metrics and logging code.</p><h3><strong>Avoid Catch-all Imports or Meta-Packages</strong></h3><p>OpenTelemetry offers auto-instrumentation packages that conveniently bundle many instrumentations. For example, <code>@opentelemetry/auto-instrumentations-web</code> will include document load, fetch/XHR, user interaction, and more. If you use it, your bundle will include <em>all</em> <em>those instrumentations</em>. To keep things slim, <em>only import the instrumentations you actually want</em> individually, instead of a blanket import. This way, unused ones can be dropped.</p><p>In code, that means doing something like:</p><pre><code><code>import {DocumentLoadInstrumentation }from'@opentelemetry/instrumentation-document-load';
import {FetchInstrumentation }from'@opentelemetry/instrumentation-fetch';
// ... then use these in registerInstrumentations ...
</code></code></pre><p>If you don&#8217;t need, say, user interaction tracking or certain network instrumentation, not importing them will ensure they don&#8217;t appear in the bundle.</p><h3><strong>Mark OTel Packages as Side-Effect-Free</strong></h3><p>Tree-shaking works best when libraries declare that they have no side effects on import. Many OTel packages now include <code>sideEffects: false</code> in their package.json, which helps Web-pack/Rollup know it can safely drop unused exports.</p><p>This was more of an issue in the previous versions. A user noted that manually adding <code>sideEffects: false</code> to OTel packages reduced bundle size by ~40KB, and the OTel maintainers addressed this in later releases. You can view the <a href="https://github.com/open-telemetry/opentelemetry-js/issues/2855">Github discussions</a> here. Using OpenTelemetry JS v1.2+ or v2.x is recommended, as newer versions have improved in this area. In fact, the OTel JS SDK 2.0 [released in 2025] explicitly removed certain patterns [like extensive classes or namespaces] to improve tree-shakability and minification.</p><p>Upgrading to the latest version can yield a smaller bundle thanks to these optimisations!</p><h3><strong>Consistent Versioning to Avoid Duplicates</strong></h3><p>One subtle cause of bundle bloat that often goes missed, is version mismatches. If you depend on multiple OTel packages that internally bring different versions of the core API, you might accidentally bundle two copies. Ensure all your OTel packages are on the same version so the bundler can deduplicate them. For instance, if everything is on version 1.5.0 except one package on 0.26.0, you may get two sets of code.</p><p>Aligning package versions will help prevent that scenario.</p><p>In summary, <em>tree-shake aggressively.</em> That means prune everything optional &#8212; disable features that aren&#8217;t useful anymore, drop instrumentations you don&#8217;t need, and let your bundler eliminate the dead code. By doing so, you minimise the impact on bundle size to a great extent.</p><p></p><h2>Lazy-Loading the OpenTelemetry SDK</h2><p>This is the next concept you can explore. Lazy-load the OTel code, so it isn&#8217;t even downloaded or executed until after the critical page content is loaded. This strategy has perhaps the biggest positive impact on LCP and initial load performance. The idea is to defer the initialisation of OpenTelemetry modules to a non-critical moment [for example, after the page&#8217;s main content is on screen or when the user interacts], rather than blocking the main thread early.</p><h3><strong>Dynamic </strong><code>import()</code><strong> in Single-Page Apps</strong></h3><p>In a React or other Single Page Application [SPA], you can use the dynamic <code>import()</code> function to load your telemetry setup code asynchronously.</p><p>For example, you might create a module <code>otel-init.js</code> that configures the OTel SDK, and <em>then</em> instead of importing it at the top of your app, you load it on demand. For instance:</p><pre><code><code>// In your main App component
useEffect(() =&gt; {
import('./otel-init').then(module =&gt; {
module.initTelemetry(); // call the initialization function exported here
  });
}, []);
</code></code></pre><p>This ensures that the OTel code [everything inside <code>otel-init</code> and its imports] is pulled in only <em>after</em> the first render. The UI can render, LCP can happen, and only then does the telemetry code load in the background. From the user&#8217;s perspective, the page appears quickly; from the app&#8217;s perspective, OTel starts slightly later.</p><h3><strong>Code-Splitting with Bundler Config</strong></h3><p>If you&#8217;re using Webpack, you can explicitly split OTel into its own chunk. For example, in an Angular app using Webpack, you can configure a separate cache group for <code>@opentelemetry</code> modules.</p><p>This means your build will produce something like <code>main.js</code> and <code>opentelemetry.js</code>. However, to truly lazy-load that chunk, you should ensure it&#8217;s not required immediately. In practice, that again means using dynamic import or a similar mechanism to load that chunk at a later time. The Webpack config might look like:</p><pre><code><code>// webpack.config excerpt
optimization: {
splitChunks: {
chunks:'all',
cacheGroups: {
opentelemetry: {
test:/[\\\\/]node_modules[\\\\/](@opentelemetry)[\\\\/]/,
name:'opentelemetry',
priority:10,
reuseExistingChunk:true,
      },
    },
  },
}
</code></code></pre><p>There&#8217;s a small trade-off here. Delaying the loading of OTel modules would also inevitably result in the loss of some early telemetry data. For example, if you want to capture any errors or events during the first few seconds, a delayed start misses them. If those are crucial, you might decide to load a minimal part of OTel early [or use a buffered logging approach] and load the rest later. It&#8217;s a balancing act.</p><p>Both of the above are proven techniques for bringing down bundle size. Apart from these, there are some more optimisations for how we send telemetry data from the browser and framework-specific techniques, which I&#8217;ll cover in another edition. Till then, adieu!</p><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! </p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[How We Achieved 30% Faster Log Queries by Overcoming ClickHouse's Native JSON Limits]]></title><description><![CDATA[What started as an investigation into filtering inconsistent dot-key notation in JSON logs ended up optimising our query performance by 30%.]]></description><link>https://newsletter.signoz.io/p/overcoming-clickhouses-json-constraints</link><guid isPermaLink="false">https://newsletter.signoz.io/p/overcoming-clickhouses-json-constraints</guid><dc:creator><![CDATA[Piyush]]></dc:creator><pubDate>Sat, 13 Dec 2025 13:02:53 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!uDTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>This piece is written by <strong><a href="https://www.linkedin.com/in/piyushsingariya/">Piysuh</a></strong>, Software Engineer at SigNoz, who was also one of the key contributors for making this engineering overhaul happen. </em></p><p><em>Cheers.</em></p></blockquote><p></p><p></p><p>Customer logs data is always messy.</p><p>Being (and building!) an <a href="https://signoz.io/">observability platform</a>, we get to see <em>all the beautiful, creative ways</em> it can be messy, every single day. And yet, our customers expect, quite fairly, I might add, perfect query results and peak performance.</p><blockquote><p>SigNoz is an open-source observability platform that can be your one-stop solution for logs, metrics and traces. Using ClickHouse as a single datastore and built to support OpenTelemetry natively, SigNoz can help you troubleshoot issues faster with powerful querying capabilities on your observability data.</p></blockquote><p>We recently overhauled how we store JSON logs in ClickHouse [our datastore] to improve query performance and enable filtering of nested dot-notation keys, which was previously not possible. What started as an investigation into filtering inconsistent dot-key notation in JSON logs ended up optimising our query performance by 30%.</p><p>In the process, we developed a two-tier JSON storage model that helped us overcome the limitations of ClickHouse&#8217;s native JSON data type while paving the way for superior query and aggregation performance for any key in customers&#8217; logs.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!uDTw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!uDTw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 424w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 848w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png" width="1456" height="1464" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1464,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:528331,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!uDTw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 424w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 848w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1272w, https://substackcdn.com/image/fetch/$s_!uDTw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F19e252bc-438d-4f1b-b25d-704a1fb683c3_1492x1500.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>How We Used to Store and Query Log Data Earlier</strong></h2><p>Before this overhaul, we stored the raw log body as a simple <code>string</code> data type. While this was easy to ingest, it created some bottlenecks when developers tried to interact with the data.</p><h3>Slow Run-Time Parsing and the Impossible GROUP BY</h3><p>Storing the log body as a string meant the database had no way to instantly look up values inside the JSON whenever a filter is applied to log data.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_WDw!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_WDw!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 424w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 848w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png" width="1456" height="670" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:670,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:342137,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_WDw!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 424w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 848w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1272w, https://substackcdn.com/image/fetch/$s_!_WDw!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff5d67228-08b4-417a-9859-36326b4a1fc5_3464x1594.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Applying log filters in SigNoz</figcaption></figure></div><p>Whenever a user entered a filter query like the one above, the entire log body [stored as a string] is dynamically converted to JSON at runtime. This led to slow query performance, especially when the selected time range was long or there was too much data to scan.</p><p>Currently, we address slower query performance with the help of <a href="https://signoz.io/docs/logs-pipelines/introduction/">log pipelines</a>. With log pipelines, users can transform their logs to suit their querying and aggregation needs before they are stored in the database.</p><p>Though log pipelines are helpful, they are not straightforward. To achieve the necessary performance, users had to manually implement a <em>log pipeline</em> to extract key-value pairs from the JSON string and store the extracted fields as separate attributes.</p><p>This is not a seamless out-of-the-box experience for users sending JSON logs.</p><h3>The Ambiguity of Dot Notation</h3><p>The final breaking point that spurred our full investigation was the ambiguity created by dot notation. Our query builder could not differentiate between logically different JSON structures when developers used dots for querying:</p><p><strong>Scenario 1: Key with Dot in Name:</strong></p><pre><code><code>{
  &#8220;user&#8221;: {
    &#8220;session.id&#8221;: &#8220;abc-1234&#8221;
  }
}
</code></code></pre><p><strong>Scenario 2: Nested JSON Structure:</strong></p><pre><code><code>{
  &#8220;user&#8221;: {
    &#8220;session&#8221;: {
      &#8220;id&#8221;: &#8220;xyz-5678&#8221;
    }
  }
}
</code></code></pre><p>Although both of these logs record the <em>same piece of information</em>, it is difficult to differentiate between the two when the user wants to run a query for them. The query needed to find the data in the first example will not work on the data from the second example, and vice versa.</p><p>And we needed something that works on both.</p><p>This means that when a user performs a search, they might get incomplete results, not realising that some data is being missed simply because of a formatting difference. We can&#8217;t expect our users to write separate, complex queries to find both formats or even perform a union to get the necessary data.</p><p>This was a really big pain point for us, and the ultimate trigger.</p><h2><strong>Normalising JSON logs in the Collector</strong></h2><p>Our first and most direct approach was to solve the problem before the data reached the database. The idea was to intercept incoming logs in our OpenTelemetry collector and transform the JSON structure <em>in-flight</em>.</p><p>The proposed solution was to have the collector inspect the keys of every incoming JSON object.</p><p>If a key contained a dot, e.g., <code>{&#8221;a.b&#8221;: &#8220;c&#8221;}</code>, our code would parse the string, create a nested JSON structure, e.g., <code>{&#8221;a&#8221;: {&#8221;b&#8221;: &#8220;c}}</code>and replace the original flattened key.</p><p>But this involved modifying the actual data the user was sending, and the performance issue was still unresolved. Given these drawbacks, we concluded that modifying the data shape within the OpenTelemetry collector was not a viable path forward.</p><p>And at the same time, ClickHouse announced a stable version of the JSON data type.</p><h2>Using ClickHouse&#8217;s native JSON data type</h2><p>With the introduction of a native <a href="https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse">JSON data type</a> in ClickHouse, we identified an opportunity to migrate from our string data type column to JSON. Adopting this also meant offloading all associated operations to ClickHouse, allowing us to leverage ClickHouse&#8217;s highly optimised, built-in functions for JSON traversal and data extraction.</p><p>But there was a limitation. Clickhouse&#8217;s native JSON type is built to handle dynamic paths of JSON keys. In order to do so, it needs them to be predictable. But log data from our customers is hardly predictable. It may contain any number of unique paths.</p><p>Before understanding how we overcame this limitation, let&#8217;s understand more about ClickHouse&#8217;s JSON data type.</p><h2>Inside the Working of ClickHouse JSON Type</h2><p>ClickHouse&#8217;s JSON data type allows you to store semi-structured JSON documents in a column while preserving efficient, columnar storage for individual JSON fields. Internally, JSON columns flatten nested JSON keys into subcolumns for query efficiency, as demonstrated below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!N0C0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!N0C0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 424w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 848w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1272w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png" width="986" height="474" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:474,&quot;width&quot;:986,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:96131,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!N0C0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 424w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 848w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1272w, https://substackcdn.com/image/fetch/$s_!N0C0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F64cc448b-9cad-4a52-94b0-649a11826c3b_986x474.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Columnar storage in Clickhouse for JSON</figcaption></figure></div><p>To define a JSON column, you can provide optional settings like <code>max_dynamic_paths</code> in the column definition, which controls how ClickHouse handles <strong>dynamic paths</strong> [incoming JSON fields whose schema or structure is unknown].</p><p>Understanding this is crucial to the solution we finally designed.</p><h3>Understanding <code>max_dynamic_paths</code></h3><p>The setting <code>max_dynamic_path</code> , limits the number of distinct JSON <em>paths</em> it will treat as separate subcolumns for any given chunk of data. This limit is defined at the table&#8217;s column level, but it is <strong>enforced per data part;</strong> each chunk of stored data [or &#8220;part&#8221;]. By default, this value falls back to 1024.</p><p>But for our customer logs, we can not have this limit.</p><p>Sometimes the incoming data can have really high cardinality [sigh], which could lead to an <a href="https://clickhouse.com/blog/a-new-powerful-json-data-type-for-clickhouse#challenge-3-prevention-of-avalanche-of-column-data-files-on-disk">explosive avalanche</a> of paths exhausting the upper limit. When the threshold is surpassed, if another distinct key appears, that key&#8217;s values [and any other new keys] will be stored in a shared data structure.</p><p>The next question that naturally arises is, what is a <em>reasonable</em> value for this setting to get optimal performance? Let&#8217;s dive deeper.</p><p></p><h3><strong>What is a reasonable maximum for JSON&#8217;s </strong><code>max_dynamic_paths</code><strong>?</strong></h3><p>The <code>max_dynamic_paths</code> setting controls how many unique JSON paths can be promoted to dedicated subcolumns per data part. The <em>reasonable</em> maximum is far from straightforward; it depends heavily on your data&#8217;s shape and the storage backend.</p><p>In most high-cardinality systems, like observability or event analytics platforms, customer-generated data contains extremely diverse JSON keys. A single dataset might include fields like <code>order.id</code>, <code>order.user_id</code>, or <em>even arbitrary UUIDs</em> (yes, seriously!) nested deep in the JSON structure. In such cases, even if you raise <code>max_dynamic_paths</code> to thousands, it gets consumed quickly because every unique key or UUID becomes a new path. No number ever feels <em>enough</em> when users continuously send data with new identifiers baked into the keys.</p><p>But what if we set <code>max_dynamic_path = 0</code> and create columns for dynamic paths on demand.</p><p></p><h2>Building a Two-Tier JSON Storage Model</h2><p>By setting <code>max_dynamic_path = 0</code>, we stopped the creation of sub-columns for any JSON path. This meant that all the JSON data ingested is stored directly in the <em>shared data structure</em>, not as sub-columns.</p><p>This becomes our baseline. Now let&#8217;s talk about performance. The effect of the change on querying is better than what existed [storing logs as a string]. With the introduction of multiple <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#new-serializations-for-shared-data-in-v258">JSON serialisation formats</a> in ClickHouse, we faced yet another critical architectural decision &#8212;which format would deliver the best for our heavy workloads, especially for all the frequent <code>GROUP BY</code> queries? <br>Let&#8217;s examine that in greater detail.</p><h3>#1. Storing Data in Advanced Serialisation Format</h3><p>ClickHouse provides several serialisation formats for storing JSON data, including the Map type, bucketed maps, and the <em>advanced JSON format</em>. You can read more about these <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#new-serializations-for-shared-data-in-v258">formats here</a>.</p><p>We performed benchmarks on both <code>map</code> and <code>advanced</code> shared data structure and found there were some big wins for <code>advanced</code> shared data structure.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!tpuJ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 424w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 848w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1272w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png" width="710" height="341" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:341,&quot;width&quot;:710,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:114699,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!tpuJ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 424w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 848w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1272w, https://substackcdn.com/image/fetch/$s_!tpuJ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd55e9634-e751-43b1-a0c4-9fd6cd1bfd08_710x341.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p>There were some big wins from opting for this format, which included better performance for aggregation and filtering of data; hence, operations like <code>GROUP BY</code> or <code>WHERE</code> clauses on specific JSON fields could be executed with high efficiency.</p><p>You can read more about advanced shared data structures <a href="https://clickhouse.com/blog/json-data-type-gets-even-better#advanced-shared-data">here</a>.</p><p>This became <strong>Tier 1</strong> of our architecture.</p><h3>#2. Promoting Frequently Queried Paths</h3><p>While Tier 1 provides an efficient baseline for querying any JSON attribute, it is not optimised for fields that are accessed frequently. The overhead of checking metadata and decoding values becomes significant at scale for certain <em>hot</em> fields.</p><p>To address the performance challenges associated with querying large JSON objects in log data, we implemented <strong>Tier 2,</strong> designed to minimise query latency by separating frequently accessed fields from the larger, less-queried JSON blob.</p><p>The core of this optimisation is the use of two distinct JSON columns for storing log attributes:</p><ol><li><p><strong>Primary JSON Blob:</strong> A standard JSON column that serves as the default repository for all incoming log attributes, which was discussed in #1. This column accommodates the long tail of infrequently accessed fields.</p></li><li><p><strong>Secondary JSON Column:</strong> A second, specialised JSON column [promoted] is dedicated to storing key-value pairs that are frequently used in query filters, aggregations, and dashboards. This column is configured to leverage <strong>ClickHouse&#8217;s dynamic path settings (default of 1024)</strong>, which we had set to zero for the primary blob. For example, a path/key called <code>body.status_code</code> is frequently queried, then it becomes stored in our secondary or promoted column.</p></li></ol><p>This provides the expected performance with ClickHouse JSON columns, without compromising consistency in structure. But how does the system determine which fields are commonly queried? Let&#8217;s dissect that.</p><h3>#3. Selecting and Ingesting Promoted Fields</h3><p>Let&#8217;s think of it as a two-part process.</p><p>1/ If a user expects to increase performance over a certain key or path, it will be added to a separate table named <code>promoted_paths</code> , let&#8217;s call them as <em>hot fields</em> for now. Every 10 seconds, the ingestion service refreshes a cached list of these <em>hot</em> fields. If a new field gains prominence in queries, it is added to this cache list.</p><p>2/ During data ingestion, the ingestion service inspects each incoming log. If the log&#8217;s JSON payload contains keys that match the list of promoted fields in the cache list, those key-value pairs are extracted and moved into the secondary/ promoted column. To prevent data duplication and reduce storage overhead, these keys are simultaneously removed from the primary JSON blob.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Evlq!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Evlq!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 424w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 848w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png" width="1456" height="653" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:653,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:361105,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Evlq!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 424w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 848w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1272w, https://substackcdn.com/image/fetch/$s_!Evlq!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd3d9c35c-27ca-4dcd-b2c0-0df0ee2c0904_3015x1352.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Here&#8217;s the entire process of splitting promoted columns and primary colums during ingestion</figcaption></figure></div><p></p><h2>Comparing Results ~ 30% Faster, 100% Lighter</h2><p>We compared the performance of the two-tier JSON model with the older String Column on filtering and <code>group by</code> queries.</p><p>On testing the query performance with a 9TB dataset, we found that the JSON data type is 30% faster in execution time and scans around 99% less data, with a slightly higher memory usage.</p><p>Here are the stats for the comparison we did with different combinations of filters on both storage models.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!U1DF!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!U1DF!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 424w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 848w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1272w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png" width="1456" height="485" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:485,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:256271,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/181501599?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!U1DF!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 424w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 848w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1272w, https://substackcdn.com/image/fetch/$s_!U1DF!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9334a6e3-2605-449f-ab11-58d3eb7a2c8e_1464x488.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>Conclusion</h2><p>This entire technical optimisation would enable customers to enjoy an enriched experience and query seamlessly without worrying about the shape and form of their data. With the two-tier model, the challenges that plagued our old system were systematically eliminated. <em>Inconsistent JSON structures</em> are now gracefully handled, with hot fields promoted and the rest stored efficiently. The <em>slow string searches</em> that once took minutes are now sub-second queries on structured data.</p><p> If you want to try our new logging experience, you can reach out to<strong> <a href="mailto:cloud-support@signoz.io">cloud-support@signoz.io</a>.</strong></p><p>If you loved this engineering deep-dive, here are some similar ones:</p><ul><li><p><strong><a href="https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one">100 Github Releases. Yet it&#8217;s day one.</a></strong></p></li><li><p><strong><a href="https://newsletter.signoz.io/p/how-we-made-our-queries-995-faster">How we made our Queries 99.5% faster</a></strong></p></li><li><p><strong><a href="https://newsletter.signoz.io/p/enabling-a-million-spans-in-trace-details-page">Engineering a Trace Details Page That Handles a Million Spans</a></strong></p></li></ul><p></p><blockquote><p><em>Feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p></blockquote><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Stay tuned for more deep technical content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p><br></p>]]></content:encoded></item><item><title><![CDATA[Patterns for Deploying OTel Collector at Scale]]></title><description><![CDATA[As applications grow, the question quickly shifts from what OTel can do to how we can deploy it effectively at scale. In this post, we&#8217;ll explore some deployment patterns for the OTel Collector!]]></description><link>https://newsletter.signoz.io/p/patterns-for-deploying-otel-collector</link><guid isPermaLink="false">https://newsletter.signoz.io/p/patterns-for-deploying-otel-collector</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 30 Nov 2025 11:30:57 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!8nS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128140; <em>Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is an honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at SigNoz are a bunch of observability fanatics obsessed with OpenTelemetry and open-source, and we reckon it&#8217;s important to share what we know. If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>On another note, feel free to check out our blogs and docs here. Our GitHub is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing Slack community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>So, you&#8217;ve embraced OpenTelemetry, and it&#8217;s been great.</p><p><em>Pat, Pat.</em></p><p>That single, vendor-neutral pipeline for your traces, metrics, and logs felt like the future. But now, the <em>future is getting bigger</em>. That simple OTel Collector configuration that worked perfectly for a few services is starting to show its limits as you scale. The data volume is climbing, reliability is becoming a concern, and you&#8217;re wondering if that single collector instance is now a bottleneck waiting to happen.</p><p><em>You&#8217;re not alone</em>. As applications grow, the question quickly shifts from <em>what</em> OTel can do to <em>how</em> we can deploy it effectively at scale. In this post, we&#8217;ll explore some deployment patterns for the OpenTelemetry Collector, moving from a simple agent to a robust, multi-layered architecture. Let&#8217;s look at the three main deployment patterns for OTel collectors and break down how each trades off complexity, scalability, and isolation; thus, choosing the right one depends on your architecture and goals.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8nS4!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8nS4!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 424w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 848w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1272w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png" width="1240" height="829" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:829,&quot;width&quot;:1240,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:73917,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8nS4!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 424w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 848w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1272w, https://substackcdn.com/image/fetch/$s_!8nS4!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff9988c33-294b-4661-845c-ae27c1e7b0b4_1240x829.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>#1. Load-Balanced/ Gateway Pattern</h2><p>Instead of relying on a single, large OTel Collector, which you can also think of as a single point of failure &#128516;, this pattern uses <em>a fleet of identical, stateless collectors sitting behind a load balancer.</em> The idea is to distribute the incoming telemetry data across this fleet, so if any single collector instance fails, the others can seamlessly take over its workload.</p><h3>Architecture with the Load Balancer</h3><p>The data flows through a few distinct layers, as shown in the figure below.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!8gAf!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!8gAf!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 424w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 848w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png" width="1456" height="788" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:788,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:209836,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!8gAf!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 424w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 848w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1272w, https://substackcdn.com/image/fetch/$s_!8gAf!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fef3d0ed3-8194-4bbd-9962-f3183afffb5c_2222x1202.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Architecture with Load Balancer</figcaption></figure></div><p><strong>Layer 1: Agents</strong></p><p>You still have OTel Collectors running as agents. These can be on individual hosts, as sidecars to your applications, or on a single Kubernetes node using a DaemonSet. The agent&#8217;s job is simply to collect data locally, batch it, and forward it to a single endpoint, the load balancer.</p><p><strong>Layer 2: The Load Balancer</strong></p><p>This is the central entry point for all telemetry data from your agents. It can be a cloud load balancer [like an AWS ELB/NLB or a GCP Load Balancer], or a self-hosted one like Nginx or HAProxy.</p><p>Its only job is to receive the data and distribute it across the fleet of gateway collectors using a strategy such as round-robin or a standard hashing algorithm.</p><p><strong>Layer 3: The Gateway Collector Fleet</strong></p><p>This is a group of two or more identical OTel Collector instances. They are the workhorses. Each collector in the fleet receives a fraction of the total data from the load balancer. They perform the heavy processing &#8212; advanced filtering, batching, retries, and exporting the data to one or more backends [e.g., SigNoz, Jaeger, etc.].</p><p></p><h3>Trade-offs &amp; Considerations</h3><p><strong>High Availability [HA]:</strong> If Collector 2 fails, the load balancer detects this and automatically redirects its traffic to Collector 1 and Collector 3. The pipeline remains up.</p><p><strong>Horizontal Scalability:</strong> If your data volume doubles, you don&#8217;t need to make your collectors twice as powerful [vertical scaling]. You can simply add more collectors to the fleet [horizontal scaling].</p><p><strong>Zero-Downtime Maintenance:</strong> You can perform rolling updates. Take one collector out of the load balancer&#8217;s pool, update it, and add it back. Repeat for the others without ever interrupting data flow.</p><p><strong>Complexity:</strong> This architecture introduces a new component, the load balancer, which must also be configured, managed, and monitored.</p><p><strong>Stateful Processors:</strong> This pattern is ideal for stateless processing. If you use OTel processors that rely on seeing all data for a given entity [e.g., the spanmetrics processor, which needs all spans for a trace], simply spraying data randomly can lead to incorrect results.</p><p>In such cases, you may need to configure your load balancer for &#8220;stickiness&#8221; or use a more advanced collector routing mechanism to ensure related data is routed to the same instance.</p><p></p><h2>#2. Multi-cluster/ Central Control-Plane Pattern</h2><p>Using a simple deployment strategy across many Kubernetes clusters is causing growing problems. It becomes hard to maintain consistent configurations and control your data with global rules.</p><p>Managing each cluster separately also creates security risks by storing credentials across multiple systems. At the same time, costs increase as each cluster sends data over expensive networks. The multi-cluster pattern fixes this by creating a central pipeline, making your data management secure, cost-effective, and easier to control.</p><h3><strong>The Multi-Stage Architecture</strong></h3><p>This pattern typically involves at least two layers of collectors, creating a <em>collect and forward</em> chain.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_uqj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_uqj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 424w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 848w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png" width="1456" height="938" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:938,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:250642,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!_uqj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 424w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 848w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1272w, https://substackcdn.com/image/fetch/$s_!_uqj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F9dbac41a-1423-4519-8ed2-8a08b5ec0a5d_1789x1153.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">The multi-stage architecture</figcaption></figure></div><p></p><p><strong>Layer 1: In-Cluster Collection [The Agent Layer]</strong></p><p>Inside <em>each</em> of your Kubernetes clusters, you run a local OTel deployment. This usually consists of a DaemonSet of collectors acting as <strong>agents</strong> [one per node] that scrape local data. These agents then forward their data to a small in-cluster gateway [a Deployment within the same cluster].</p><p>The primary role of this layer is to collect all data within its own cluster, add cluster-specific metadata [e.g., <code>cluster.name: prod-us-east-1</code>], and forward it to the next stage.</p><p><strong>Layer 2: Regional Aggregation [The Gateway Layer]</strong></p><p>This layer involves a central, highly available fleet of OTel Collectors to serve an entire region or logical environment [e.g., all US-East production clusters]. This regional gateway receives data from the in-cluster gateways of all the clusters it manages.</p><p>This is also where you can centralise your logic. The regional gateway handles:</p><ul><li><p>Authenticating with the final observability backends.</p></li><li><p>Enforcing global sampling rules.</p></li><li><p>Enriching data with region-level metadata.</p></li><li><p>Routing data to different backends based on type or team.</p><p></p></li></ul><h3>Trade-offs &amp; Consideration</h3><p><strong>Enhanced Security:</strong> Only the regional gateways need the secrets to connect to your final backends. The collectors inside your many clusters do not significantly reduce your security footprint.</p><p><strong>Centralised Management:</strong> You can manage your primary configuration [export destinations, sampling, etc.] in one place [the regional gateway] rather than in dozens. This makes updates and policy changes simple and consistent.</p><p><strong>Sizing:</strong> Each layer of the pipeline must be sized and scaled appropriately to handle the data volume from the layer below it.</p><p><strong>Network Paths:</strong> Ensure reliable, secure network connectivity between your clusters and the regional gateway.</p><p></p><h2>#3. Per Signal Pattern</h2><p>This pattern involves creating separate, parallel pipelines for each telemetry signal type, i.e, instead of a single, unified OTel Collector fleet that processes all signals together, you deploy specialised fleets &#8212; one for traces, one for metrics, and one for logs.</p><h3>Architecture with Agents &amp; Routing</h3><p>The OTel agents are configured to collect all signals as usual. At the first possible stage [either in the agent itself or in a simple first-layer gateway], the data is split. The OTel Collector&#8217;s routing processor is often used here.</p><ul><li><p>All traces are routed to the <em>Trace Gateway</em> fleet.</p></li><li><p>All metrics are routed to the Metrics Gateway fleet.</p></li><li><p>All logs are routed to the <em>Logs Gateway</em> fleet.</p></li></ul><p>Each gateway fleet is configured and optimised only for its specific signal type, with its own set of processors, and exports to its corresponding observability backend like SigNoz.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!o6SG!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!o6SG!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 424w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 848w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png" width="1456" height="825" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ebbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:825,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:1100975,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!o6SG!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 424w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 848w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1272w, https://substackcdn.com/image/fetch/$s_!o6SG!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Febbbcb9e-2697-49b5-84f3-ef95518f2907_1958x1109.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Per-Signal Architecture</figcaption></figure></div><h3>Trade-Offs &amp; Consideration</h3><p><strong>Independent Scalability:</strong> You can scale your logging fleet to handle huge volumes without over-provisioning your tracing or metrics pipelines.</p><p><strong>Resource Optimisation:</strong> You can use CPU-optimised instances for your log collectors and memory-optimised instances for trace collectors, depending on load and necessity.</p><p><strong>Higher Operational Overhead:</strong> You are now managing three or more separate collector fleets, each with its own configuration, deployment pipeline, and monitoring. Might <em>get tiring</em>!</p><p><strong>Signal Correlation:</strong> It becomes more difficult to correlate signals at the collector level [e.g., using the spanmetrics processor to generate metrics from traces], as the data is already on separate paths.</p><p></p><h2>How To Choose the Right Deployment?</h2><p>The short answer is that there is <em>no hard-and-fast rule</em> for what is <em>right</em>. But we have put out a small guide that can help you understand some potential options you can explore.</p><p>If you have many clusters or regions that need unified telemetry, use the <em>multi-cluster</em> [control-plane] pattern. Designate one cluster as the central collector host, and configure each cluster&#8217;s agent/sidecar to export to it. This way, you get consistent processing [e.g. cross-cluster tail sampling] at the cost of cross-cluster links.</p><p><em>OR</em> <strong>I</strong>f different teams or customers must be isolated for privacy or regulatory compliance reasons (which are now getting stricter!), use a <em>multi-tenant</em> pipeline strategy. For example, tag data by team and have the collector route it to separate backends or processing paths. This limits the blast radius; one team/tenant&#8217;s misconfiguration won&#8217;t contaminate another&#8217;s data.</p><p><em>OR</em></p><p>When you need maximum ingestion throughput and uptime, deploy a load-balanced collector. Put collectors behind a robust <a href="https://www.haproxy.com/glossary/what-is-layer-7-load-balancing">L7 load balancer</a> so you can autoscale instances on demand. This handles bursts by spreading the load and avoiding any single Collector becoming a bottleneck.</p><p><em>OR</em></p><p>If your metrics/trace/log volumes differ greatly, consider splitting pipelines by <a href="http://signal.Like">signal.</a> As we mentioned above, run one collector for metrics [allowing many scraper replicas] and another for traces [optimised for tail sampling]. This lets you scale each pipeline to its workload without interference.</p><p><em>OR</em></p><p>For small deployments or strict budget constraints, start with a single Collector or node-level agents/sidecars to minimise infrastructure costs. As load grows or performance demands rise, move to more complex patterns: for example, add a gateway layer or switch to a load-balanced, multi-instance setup. Conversely, if ultra-low latency and resilience are paramount, an agent-and-gateway hybrid [per-node agents forwarding to central gateways] offers local buffering and global control.</p><p></p><h2>Words of Wisdom from the Field</h2><p>Here are some snippets with Sreekanth Chekuri, who is a Senior Software Engineer at SigNoz and also a contributor to OpenTelemetry. We hope some of these pointers will help guide you when designing the architecture for deploying your OTel collectors!</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZE6e!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 424w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 848w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:2037520,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/180160231?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZE6e!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 424w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 848w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1272w, https://substackcdn.com/image/fetch/$s_!ZE6e!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb7ee3d8d-4627-4d1d-bc4c-332d2294e735_1904x1071.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h3>Maximising Performance and Throughput</h3><ul><li><p><strong>Embrace Batching for Efficiency:</strong> He points out that a simple change like batching your data [e.g., up to 25k entries] significantly improves throughput. This works because it reduces unnecessary system calls and processing overhead, letting your pipeline work smarter.</p></li><li><p><strong>The Power of Resource Scrutiny:</strong> Remember that <em>resource requirements aren&#8217;t static</em>. If your collector is doing heavy data transformation like parsing complex logs or extracting attributes, it will naturally need more CPU and memory. Always size your Collector based on the processing load, not just the ingestion rate.</p></li></ul><p></p><h3>Strategic Collector Deployment</h3><ul><li><p><strong>Split by Signal for Precision:</strong> For optimal resource allocation, Sreekanth  recommends running <em>separate collectors for different signals</em>. This allows you to allocate memory and CPU precisely where needed, avoiding a single resource hog.</p></li><li><p><strong>Handle Traces with Care:</strong> Be mindful that <strong>tail-based sampling for traces</strong> is memory-intensive and requires specialised handling. If you mix this heavy operation with standard log or metric processing, it can impact the reliability of your entire system. Splitting these signals solves that problem.</p></li></ul><p></p><h3>Cautions</h3><ul><li><p><strong>Know Your Tools:</strong> While alternative data pipelines exist, he cautions against simply swapping out the OpenTelemetry Collector for tools like Vector. You risk losing the many powerful, built-in capabilities and standardised features that the OTel ecosystem provides.</p></li><li><p><strong>Watch Out for CPU Hogs:</strong> Some OTel processors, such as the transform processor, can be highly CPU-intensive. Use them judiciously, as they can significantly impact performance and scalability if overused in a high-throughput environment.</p></li></ul><p></p><div><hr></div><p><em>Thanks to <strong><a href="https://www.linkedin.com/in/jpkroehling/">Juraci</a></strong> for suggesting some edits to the initial version of this blog! Also, for his <strong><a href="https://github.com/jpkrohling/opentelemetry-collector-deployment-patterns">GitHub repository</a></strong>, which acted as a guidepost while I was learning about various patterns.</em></p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Stay tuned for more cool technical content!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[What is eBPF & What Does it Mean for Observability?]]></title><description><![CDATA[Decoding the buzz behind eBPF!]]></description><link>https://newsletter.signoz.io/p/what-is-ebpf-and-what-does-it-mean</link><guid isPermaLink="false">https://newsletter.signoz.io/p/what-is-ebpf-and-what-does-it-mean</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sat, 22 Nov 2025 13:34:06 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!mj95!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p></p><div><hr></div><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>On another note, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>eBPF is kind of like <em>matcha -</em> it has been around for a long time, yet it&#8217;s only within the past couple of years that it emerged as one of the latest trends and buzzwords in the industry.</p><p>I can&#8217;t explain how <em>matcha</em> became the world&#8217;s most popular drink (maybe another time &#128521;), but I will take today&#8217;s blog as an opportunity to tell you how eBPF has become a big deal for <em>revolutionising observability at the kernel level</em>, among many other dope stuff. Let&#8217;s look at the history of eBPF, how it works, what problems it solves, and why you &#8211; yes, <em>you!</em> &#8211; should start taking advantage of it today.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mj95!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mj95!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 424w, https://substackcdn.com/image/fetch/$s_!mj95!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 848w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1272w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png" width="708" height="499.5043536503684" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:3160,&quot;width&quot;:4479,&quot;resizeWidth&quot;:708,&quot;bytes&quot;:982065,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!mj95!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 424w, https://substackcdn.com/image/fetch/$s_!mj95!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 848w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1272w, https://substackcdn.com/image/fetch/$s_!mj95!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fffb9eb5d-24cd-4758-8a94-e238e78af881_4479x3160.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2>What is eBPF?</h2><p>eBPF - or the <em>extended</em> Berkeley Packet Filter, as it was formally known - is the name of a kernel execution engine that runs a variety of new programs in a performant and safe sandbox in the kernel.</p><p>If the above definition flew right past your head, let me simplify it. It&#8217;s almost like putting JavaScript into the Linux kernel. JavaScript can run programs safely in a browser sandbox similar to eBPF in a kernel.</p><p>With eBPF, developers can execute custom programs [typically in a restricted C syntax] and load them at runtime in kernel space without the need to modify kernel source code or add additional modules.</p><p>Originally derived from the classic BPF used for packet filtering, eBPF greatly extends its scope beyond networking to any part of the system. Since eBPF has evolved <em>way</em> beyond packet filtering, it&#8217;s almost an understatement to refer to it as &#8220;extended&#8221;, and the acronym is not in active use anymore.</p><p>If you are interested in the evolution of eBPF, ideas and thoughts in the early days, take a look at the documentary below. This is also a great example of all the work that went behind the scenes to get code merged in a large codebase like Linux.</p><div id="youtube2-Wb_vD3XZYOA" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Wb_vD3XZYOA&quot;,&quot;startTime&quot;:&quot;3s&quot;,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Wb_vD3XZYOA?start=3s&amp;rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><p></p><h2>How does eBPF work?</h2><p>By now, we have established that eBPF is a crazy technology. What happens BTS of how eBPF programs function is even more mind-blowing. Let me take a moment to explain it in-depth.</p><h3>Step 1: Write an eBPF Program</h3><p>Everything starts with writing the logic you want the kernel to execute. This is typically done in a restricted, C-like language. It&#8217;s not full C. For example, you can&#8217;t have unbounded loops or call just any function you want. The goal is to create a small, efficient piece of code that is guaranteed to run quickly and safely. Instead of calling standard libraries, eBPF programs use a special set of <em>helper functions</em> provided by the kernel to interact with the system, such as getting the current process ID or looking at network packet data.</p><h3>Step 2: Compilation to Bytecode</h3><p>Once the C code is written, it&#8217;s compiled into eBPF bytecode using a toolchain like <strong>Clang/LLVM</strong>. This bytecode is a universal, platform-independent instruction set that the Linux kernel can understand. This is similar to how Java code is compiled into bytecode to run on the Java Virtual Machine (JVM). In this case, the <em>virtual machine</em> is a secure one that lives inside the Linux kernel itself. The output is typically an ELF file containing the bytecode and definitions for any maps the program will use.</p><h3><strong>Step 3: Load the Program and Create Maps</strong></h3><p>This step is handled by a <strong>user-space application</strong>. This is a normal program you write in a language like Go, Rust, or Python that acts as the controller for your eBPF code. This application performs two key tasks:</p><ul><li><p>It reads the eBPF bytecode from the file created in Step 2.</p></li><li><p>It uses a special system call (bpf()) to load that bytecode into the kernel.</p></li></ul><p>At this stage, the user-space application also creates any <strong>eBPF maps</strong> the program needs. These maps are the crucial bridge for communication. They are key-value data structures that can be accessed by both the eBPF program in the kernel and the user-space application.</p><h3>Step 4: <strong>Verification and JIT Compilation</strong></h3><p>This is the most critical step for ensuring safety and performance. As soon as the kernel receives the eBPF bytecode, it passes it to the <strong>Verifier</strong>. The verifier performs a static analysis of the code to prove that <em>it is safe to run</em>. It checks for infinite loops, out-of-bounds memory access, and illegal instructions. If the program fails verification, it is immediately rejected.</p><p>If the program passes verification, the kernel then uses a <em>Just-In-Time (JIT) compiler</em> to translate the eBPF bytecode into native machine code for the host CPU. This means the code doesn&#8217;t have to be interpreted, allowing it to run at nearly the same speed as natively compiled kernel code.</p><h3><strong>Step 5: Attach and Execute</strong></h3><p>After being loaded and verified, the eBPF program is in the kernel but is not yet active. The user-space application must explicitly attach it to a specific event hook. This could be:</p><ul><li><p>A network interface, to inspect incoming/outgoing packets [XDP or TC hooks].</p></li><li><p>A system call entry/exit point [a tracepoint].</p></li><li><p>The entry or exit of a function in the kernel or a user-space application [kprobe or uprobe].</p></li></ul><p>Once attached, the kernel will automatically trigger the eBPF program every time that event occurs [Yes, eBPF is event-driven!]. The program runs, performs its task [like updating a counter in an eBPF map], and exits all within the kernel context, making it incredibly fast. Meanwhile, the user-space application can periodically read from the eBPF map to collect the data and present it to the user.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!TuJ1!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 424w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 848w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png" width="1299" height="1064" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1064,&quot;width&quot;:1299,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:263283,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!TuJ1!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 424w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 848w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1272w, https://substackcdn.com/image/fetch/$s_!TuJ1!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd57d54d9-196d-4f8c-a46c-2249d5a6da27_1299x1064.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">BTS of how eBPF programs run</figcaption></figure></div><h2>eBPF for Observability</h2><p>Let&#8217;s understand how eBPF could be used for observability by checking how it provides visibility into each of the three pillars.</p><h3>Metrics</h3><p>eBPF can be used to collect highly granular performance metrics that are impossible to see from the user space. For example, you can attach an eBPF program to kernel functions to precisely measure TCP retransmits, disk I/O latency, or time spent scheduling processes.</p><h3>Logs</h3><p>While not a replacement for traditional application logs, eBPF can generate highly contextual <em>event logs</em>. For example, you can create a log every time a process opens a sensitive file, writes to a specific socket, or executes a new program, complete with the process ID and user ID. This provides a powerful audit trail for security and debugging.</p><h3>Traces</h3><p>eBPF can automatically trace requests between services without any code changes. By observing the send() and recv() system calls made by applications, eBPF-powered tools can stitch together a distributed trace, even across different programming languages. It can even trace encrypted traffic [like HTTPS] by hooking into the application&#8217;s memory <em>before</em> the data is encrypted.</p><h2>Tracing File Opens with eBPF</h2><p>Let&#8217;s put the above theory into practice. Here&#8217;s a small example of how we can count the number of file opens with eBPF. We are controlling the eBPF program via Python. Since we are using the BCC [BPF Compiler Collection] framework, which is a popular Python library for writing and loading eBPF programs, we should have it installed.</p><p>Here&#8217;s the program/ script for the eBPF program that traces the <code>openat()</code> syscall, and logs the process ID, process name, and file path each time a file is opened.</p><pre><code><code>from bcc import BPF

# eBPF program that hooks into the openat syscall
bpf_code = &#8220;&#8221;&#8220;
#include &lt;uapi/linux/ptrace.h&gt;
#include &lt;linux/sched.h&gt;

struct data_t {
    u32 pid;
    char comm[TASK_COMM_LEN];
    char fname[256];
};

BPF_PERF_OUTPUT(events);
int trace_openat(struct pt_regs *ctx, int dfd, const char __user *filename, int flags) {
    struct data_t data = {};

    // Capture process ID and name
    data.pid = bpf_get_current_pid_tgid() &gt;&gt; 32;
    bpf_get_current_comm(&amp;data.comm, sizeof(data.comm));

    // Capture file name
    bpf_probe_read_user(&amp;data.fname, sizeof(data.fname), filename);

    // Send the data to user-space
    events.perf_submit(ctx, &amp;data, sizeof(data));
    return 0;
}
&#8220;&#8221;&#8220;

# Load the eBPF program
b = BPF(text=bpf_code)

# Attach eBPF program to the openat syscall
b.attach_kprobe(event=&#8221;sys_openat&#8221;, fn_name=&#8221;trace_openat&#8221;)

# Function to print the output
def print_event(cpu, data, size):
    event = b[&#8221;events&#8221;].event(data)
    print(f&#8221;PID: {event.pid}, Process: {event.comm.decode(&#8217;utf-8&#8217;)}, File: {event.fname.decode(&#8217;utf-8&#8217;, &#8216;replace&#8217;)}&#8221;)

# Open a perf buffer to receive events from kernel space
b[&#8221;events&#8221;].open_perf_buffer(print_event)

# Continuously listen for events and print them
while True:
    b.perf_buffer_poll()
</code></code></pre><p>Execute the script with root privileges, as eBPF requires them to load programs into the kernel.</p><pre><code><code>sudo python3 &lt;name _of_file&gt;
</code></code></pre><p>Let&#8217;s break down the code into its two main parts.</p><p></p><h3>The eBPF Program [The C Code]</h3><p>This is the logic that runs securely inside the kernel.</p><ul><li><p><code>struct data_t</code>: We first define a C struct. This is the <em>shape</em> of the data we want to send from the kernel to our Python program. In our example, it holds the process ID, the command name, and the filename.</p></li><li><p><code>BPF_PERF_OUTPUT(events)</code> : This is a BCC macro that creates a high-performance communication channel called events. It allows us to efficiently send data from the kernel to user space without slowing the system down.</p></li><li><p><code>int trace_open(struct pt_regs *ctx)</code>: This is our main eBPF function. It gets the current process ID [pid] and command name [comm] using eBPF helper functions [bpf_get_current_pid_tgid() and bpf_get_current_comm()].</p></li><li><p>The most important part is <code>bpf_probe_read_user_str()</code>. The filename exists in the memory of the application making the system call, not in the kernel. This special helper function safely copies the filename string from the user&#8217;s application memory into our <code>data. filename</code> variable.</p></li><li><p>Finally, <code>events.perf_submit()</code> pushes our completed data structure into the events perf buffer, making it available to our Python script.</p><p></p></li></ul><h3>The User-Space Controller [The Python Code]</h3><p>This Python script loads and manages the eBPF program.</p><ul><li><p><code>b = BPF(text=bpf_program)</code>: This line is where the BCC magic happens. It takes our C code as a string, compiles it into eBPF bytecode, and loads it into the kernel. The kernel&#8217;s Verifie<strong>r</strong> checks the bytecode to ensure it&#8217;s safe before allowing it to be loaded.</p></li><li><p><code>b.attach_kprobe(...)</code>: This is the crucial step where we <em>attach</em> our trace_open C function to a kernel event. We use a kprobe [kernel probe] to hook into the kernel function that handles the openat system call. Now, every time any process on the system calls openat, our eBPF code will run first.</p></li><li><p><code>b[&#8221;events&#8221;].open_perf_buffer(print_event</code><strong>)</strong>: This tells our script to start listening to the events channel we created in the C code. For every piece of data that comes through, it will call our Python function print_event.</p></li><li><p><code>while True: b.perf_buffer_poll()</code>: This is the main event loop. The script sits here, efficiently waiting for data to arrive from the kernel. When data is available, it triggers the print_event callback to print the formatted output to your screen.</p></li></ul><p>Once you run the script with root privileges, you will see output like this,</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!DEaP!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!DEaP!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 424w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 848w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1272w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png" width="1422" height="174" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:174,&quot;width&quot;:1422,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:46255,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/179643045?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!DEaP!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 424w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 848w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1272w, https://substackcdn.com/image/fetch/$s_!DEaP!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd6378ead-23ab-45c9-bd5c-529da0d0a152_1422x174.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p>Although this is a very basic example, it gives a good insight into how eBPF programs work from code to monitoring calls. eBPF is no longer a niche technology, but something that is being widely adopted by orgs at various levels, revolutionising the tech industry &#8212; one <em>matcha</em> at a time. &#127861;</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk. Subscribe to read more awesome tech stuff!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[100 GitHub Releases. Yet it's day one 😊]]></title><description><![CDATA[Here&#8217;s to the next 100. We&#8217;re just getting started!]]></description><link>https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one</link><guid isPermaLink="false">https://newsletter.signoz.io/p/100-github-releases-yet-its-day-one</guid><dc:creator><![CDATA[Anushka Karmakar]]></dc:creator><pubDate>Sun, 16 Nov 2025 11:31:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!7J0t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<blockquote><p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz!</em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at <strong><a href="https://signoz.io/">SigNoz</a></strong> are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p></p><p><em>This piece is written by <a href="https://www.linkedin.com/in/anushkakarmakar/">Anushka</a>, PMM at SigNoz, on account of completion of 100 releases at SigNoz. </em></p><p><em>Also, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">Slack</a></strong> community for the latest news!</em></p><p><em>Cheers.</em></p></blockquote><p></p><p>We just shipped our <strong><a href="https://github.com/SigNoz/signoz/releases/tag/v0.100.0">100th GitHub release</a></strong>.</p><p>You would think a milestone like this would feel like an arrival, a moment to look back and say, &#8220;Yay, we made it.&#8221;</p><p>But when I sat down with the team to understand how they felt, everyone said some version of the same thing - &#8220;We are just getting started.&#8221;</p><p>To understand what it actually takes and feels like to ship 100 releases, I talked to a few of my teammates from different junctures of the product story.</p><p>This is their story. Welcome to our journey of 100 releases.</p><blockquote><p><em>A quick side note on methodology: My highly scientific process for selecting interviewees involved grabbing anyone who wasn&#8217;t in a meeting at the eleventh hour. While their stories are amazing, they are just a few of the many that make up our 100-release journey. A huge thank you to the entire team!</em></p></blockquote><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!7J0t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!7J0t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 424w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 848w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1272w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png" width="1200" height="739" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:739,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:119410,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/178980032?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!7J0t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 424w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 848w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1272w, https://substackcdn.com/image/fetch/$s_!7J0t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F2490a1e0-3a75-40c8-baf9-5bf7a041a478_1200x739.png 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><p></p><h2><strong>The first hire</strong></h2><p><strong><a href="https://www.linkedin.com/in/ankit-anand-686a53a1/">Ashu</a></strong> was the third person at SigNoz, joining the two founders in May 2021 as a Growth Manager. His job was to let more developers know that SigNoz existed and that it was OpenTelemetry native. When he joined, SigNoz had around 600-700 GitHub stars.</p><p>The problem was that nobody was paying attention yet. Our Slack community had crickets. But Ashu believed in the founders and their genuine care to improve the experience of fellow developers. This was his push to keep going.</p><p>He kept writing content on dev.to, publishing on Hacker News, and explaining how to implement OpenTelemetry in Java applications when OpenTelemetry itself was brand new.</p><p>This soon became a habit. It eventually built up and gave us numbers to chase.</p><p>We&#8217;d trend on GitHub occasionally, articles would go viral, and when people visited the repo, they&#8217;d see the value and star it. The momentum kept attracting people. Soon, contributors started trickling in. Later, customers became contributors too.</p><p>Today, SigNoz has <strong><a href="https://github.com/SigNoz/signoz/">24,200+ stars</a></strong> on GitHub (as of 5th Nov, 2025).</p><h2><strong>The first backend engineer</strong></h2><p>While Ashu was getting the word out, <strong><a href="https://www.linkedin.com/in/makeavish/">Vishal</a></strong> joined us in November 2021 as our first backend engineer, starting on the traces module. For us, OpenTelemetry was never a mere technology choice; it was in our DNA from day one. We learned from the OpenTelemetry community, raised issues upstream, and helped debug.</p><p>Community was always core to what we were building.</p><p>As the product gained traction, users started asking for a SaaS version. They didn&#8217;t want the hassle of setting up and maintaining the open-source infrastructure. The team decided to launch SigNoz Cloud during a workation in Goa in December 2023, with the idea of just doing a one-week soft launch to test for issues and then rolling it back. In the first week, we got four signups. We never rolled it back.</p><p>Vishal remembers the chaos of that launch vividly. The team was at the beach when Nitya got a message about the sign-up flow being broken and him running back (literally) to the room to fix it. That workation launch, which they thought was just a pilot, became the real thing.</p><p>That&#8217;s when Vishal&#8217;s role shifted. He went from being a backend engineer to a product manager, spending his days on calls with customers, debugging issues, and doing manual setups. The work wasn&#8217;t glamorous. There were sleepless nights and pressure from large companies to prioritize their features.</p><p>But that messy, tedious work had to get done. Customers like <strong><a href="https://signoz.io/case-study/kiwi/">Kiwi</a></strong> helped shape the product through their feedback and pull requests, pushing us to build for actual scale, not just theoretical scale.</p><h2><strong>Bringing logs to the product</strong></h2><p>When <strong><a href="https://github.com/nityanandagohain">Nitya</a></strong> joined in April 2022, we had around 40-50 weekly active users. His first task was to bring logs into the product. We had traces and metrics, but logs were the missing piece.</p><p>Just a month after he joined, during a team workation in May 2022, a conversation over lunch set the tone for how we build. Pranay, CEO of SigNoz, asked Nitya what he was working on.</p><p>&#8220;Logs. Building out the schema,&#8221; Nitya said.</p><p>&#8220;How much data are you testing on?&#8221;</p><p>&#8220;One million.&#8221;</p><p>&#8220;No, no. Test it on one billion.&#8221;</p><p>Nitya spent the next week running schemas against a billion log lines, again and again. That ambition became our standard. When we released the <strong><a href="https://signoz.io/blog/logs-performance-benchmark/">first logs benchmark</a></strong>, it caught fire on Hacker News. The traffic exploded, and with it, more customers and more data.</p><p>But as customers grew, so did the complexity. Nitya manually migrated hundreds of customers to new schemas over six months. We couldn&#8217;t afford to lose their data or break their workflows.</p><p>What worked at 50 users didn&#8217;t work at 500. Every new customer taught us something, and every incident made us more careful.</p><h2><strong>The first frontend lead</strong></h2><p>We had a functional product, but it was far from pretty. That&#8217;s where <strong><a href="https://github.com/YounixM">Yunus</a></strong> came in, joining in August 2023 as our first frontend engineer.</p><p>Frontend engineers typically don&#8217;t gravitate toward dev tools, which are often built for SREs and backend folks. But Yunus wanted to build a culture where engineers think beyond their specific skillset.</p><p>His philosophy was simple: &#8220;You are not a frontend engineer or a backend engineer. You are a software engineer.&#8221; He wanted everyone to understand the &#8216;why&#8217; before getting into the &#8216;how.&#8217;</p><p>This thinking was crucial.</p><p>When Yunus joined, we were moving fast, but we weren&#8217;t always thinking in systems. He focused on building processes to make our frontend more stable and predictable because people make mistakes, but good systems can prevent those mistakes from breaking things.</p><h2><strong>Joined at the 34th release</strong></h2><p><strong><a href="https://github.com/vikrantgupta25">Vikrant</a></strong> joined as a frontend engineer in January 2024. The frontend wasn&#8217;t stable; fixing one thing often broke another. He felt disconnected from the full picture, but an opportunity came up that changed everything. The provisioning flow for new sign-ups was constantly breaking, and Vikrant had already expressed interest in learning backend.</p><p>So he made the shift. He had a one-month crash course, taking over ownership from <strong><a href="https://github.com/therealpandey">Pandey</a></strong>. It was a ticking bomb. Either he figured it out, or the sign-up flow stayed broken.</p><p>And damn, he did figure it out.</p><p>Later, the community asked for something ambitious - <strong><a href="https://signoz.io/blog/traces-without-limits/">loading traces with millions of spans</a></strong>. We didn&#8217;t want to build a makeshift solution. We wanted to solve it permanently. After two months of intense work, we could load a million spans on a single screen.</p><p>Then came the push for provisioning v1.0. Our deadline was the Tuesday platform retrospective at 6:30 PM. As the clock ticked, the call got pushed to 7:00, then 7:15, then 7:30. We finally deployed to production, tested it, and joined the call right after. The entire team stood (well, practically sat in front of their laptops) together until v1.0 was stable.</p><p>The next challenge was to improve our SQL database schemas. For three months, we had to break down our entire infrastructure and rebuild it. Every Wednesday noon, Vikrant, Pandey, and Nitya would send each other memes, about the known pattern - a release would go out, and by 12:30 PM, a bug would be reported. Every single time.</p><p>But as Vikrant puts it, &#8220;If you&#8217;re tired, do it tired.&#8221; And when you have the back of your team, it does get easy.</p><h2><strong>Building processes that scale</strong></h2><p><strong><a href="https://github.com/therealpandey">Pandey</a></strong> joined in February 2024, when SigNoz was just a bunch of people executing. There were no pods, no real structure. His first task was to stabilize data ingestion. Every other day, a customer complained they couldn&#8217;t ingest data. It was a race against time.</p><p>After stabilizing ingestion, he turned to a bigger question - What does it take to run a high-performing team?</p><p>He laid the foundation for the platform pod, starting with just him and Vikrant. They implemented sprints, retrospectives, and reporting - a process that ran for eight months with just the two of them before being adopted company wide.</p><p>Introducing structure wasn&#8217;t easy. It led to internal friction and disagreements on how things should be done.</p><p>But as Pandey notes, the culture being set today is what new people will embody. That&#8217;s how the baton gets passed.</p><h2><strong>Eight months in</strong></h2><p><strong><a href="https://github.com/piyushsingariya">Piyush</a></strong> joined in March 2025, and eight months later, he says it already &#8220;feels like forever&#8221; in the best way. At previous companies, the push was to ship features fast, any way possible. Here, he found the time and space to do it the right way the first time.</p><p>Working with Nitya on logs, he had to learn a new way of collaborating remotely. It took a few weeks to align on the thinking behind testing certain things, but over time, the context builds.</p><p>Now, Piyush is in the position Nitya was in when he first joined. He is responsible for making logs better and working on complex features like cloud integrations. He&#8217;s also exploring JSON logs, which he believes will boom fast and make many data pipelines redundant.</p><p>And well, his story connects everything. Someone is still at their Day 1, even on our 100th release.</p><h2><strong>Day 1, again</strong></h2><p>Every Wednesday, I look forward to the release. And it&#8217;s not just because I write the changelog. It&#8217;s become my favorite part of the week. As a marketer, I couldn&#8217;t ask for a better way to stay connected to what we&#8217;re actually building.</p><p>Yet here I am, writing a nostalgic story instead of a technical, feature-tracking blog. Because at the end of the day, there are humans building these sophisticated features, and their stories are worth hearing. at least sometimes, if not often.</p><p>The Day 1 feeling isn&#8217;t restricted to engineering. A few days ago, we launched our first-ever mascot. We ran our first integrated campaign. Every one of these feels like a beginning.</p><p>It&#8217;s Ashu pushing through the crickets. Vishal doing the messy, unglamorous work. Nitya testing for a billion when a million seemed like enough. Yunus building systems, Vikrant doing it tired, and Pandey introducing structure when velocity felt more important.</p><p>These stories are limited to a few, but they echo the team&#8217;s sentiment at large.</p><p>Here&#8217;s to the next 100. We&#8217;re just getting started!</p><div><hr></div><p>This spirit extends beyond our internal team. Our community has been with us every step of the way, which is why this past July, we were thrilled to launch the <strong><a href="https://signoz.io/blog/community-advocate-program/">SigNoz Community Advocate Program</a></strong>. It&#8217;s our way of recognizing the passionate developers who help others succeed with observability.</p><p>Shout-out to our inaugural advocates for their incredible contributions: <strong><a href="https://github.com/mgilham">Mathew Gilham</a></strong>, <strong><a href="https://github.com/MattiDeGrauwe">Matti De Grauwe</a></strong>, <strong><a href="https://github.com/KieranP">Kieran Pilkington</a></strong>, <strong><a href="https://github.com/gfelot">Gil Felot</a></strong> and <strong><a href="https://github.com/nlamirault">Nicolas Lamirault</a></strong>.</p><p>And in a moment of perfect serendipity, just as I was about to create the PR for this post, we welcomed our 500th paid customer.</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to receive new posts and support my work.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div>]]></content:encoded></item><item><title><![CDATA[ELI5 Auth Model for OpenTelemetry Collector]]></title><description><![CDATA[In modern systems, where even a small mishap can wreak havoc and you might wake up to a $$$ bill the next day, we should do whatever is within our capacity to secure our systems.]]></description><link>https://newsletter.signoz.io/p/eli5-auth-model-for-opentelemetry</link><guid isPermaLink="false">https://newsletter.signoz.io/p/eli5-auth-model-for-opentelemetry</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Sun, 26 Oct 2025 12:02:48 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!BUjn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>&#128140;<em> Hey there, it&#8217;s Elizabeth from SigNoz! </em></p><p><em>This newsletter is </em>a<em>n honest attempt to talk about all things - observability, OpenTelemetry, open-source and the engineering in between! We at&nbsp;<strong><a href="https://signoz.io/">SigNoz</a></strong>&nbsp;are a bunch of observability fanatics obsessed with OpenTelemetry and open-source</em>, <em>and we reckon it&#8217;s important to share what we know.</em> <em>If this passes your vibe-check, we&#8217;d be pleased if you&#8217;d subscribe. We&#8217;ll make it worth your while.</em></p><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe now&quot;,&quot;action&quot;:null,&quot;class&quot;:null}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/subscribe?"><span>Subscribe now</span></a></p><p><em>On another note, feel free to check out our <strong><a href="https://signoz.io/resource-center/blog/">blogs</a> </strong>and <strong><a href="https://signoz.io/docs/introduction/">docs</a></strong> here. Our <strong><a href="https://github.com/SigNoz/signoz">GitHub</a> </strong>is over here, and while you are at it, we&#8217;d appreciate it if you sent a star </em>&#11088;<em> our way. You&#8217;re also welcome to join the conversation in our growing <strong><a href="https://arc.net/l/quote/rmuathlr">&nbsp;Slack</a></strong>&nbsp;community for the latest news!</em></p><p><em>Cheers.</em></p><div><hr></div><p>In any type of software that involves the movement of data&nbsp;<em>or&nbsp;</em>information, there is a pressing need to make the passage of data secure. One way of achieving this is by <em>authentication</em>. You must have experience authenticating API calls or other data streams. </p><p>Gemini defines authentication as <em>the process of verifying that a user, device, or system is who or what it claims to be, typically by using credentials like a username and password<strong>. </strong></em>When I was first learning about authenticating systems, I related it to the term&nbsp;<em>authenticity, which is&nbsp;</em>closely related to<em>&nbsp;trustworthiness, </em>that is, can the source of incoming data or request be&nbsp;<em>trusted</em>&nbsp;enough to accept it<em>?  </em>You can stick with a definition or build an idea based on what works best for you. :)</p><p>In modern systems, where even a small mishap can wreak havoc and you might wake up to a $$$ bill the next day, we should do whatever is within our capacity to secure our systems.  </p><p>That&#8217;s why this week, I want to talk about something crucial but often overlooked: <em>Authentication for your OpenTelemetry Collectors</em>. These collectors are the busy data hubs of your observability pipeline, handling huge amounts of information every moment. Securing them is non-negotiable, and also a perfect use case for strong authentication.</p><p></p><h2>Authentication in OpenTelemetry</h2><p>Firstly, OpenTelemetry on its own doesn&#8217;t define an authentication protocol or an auth model. OpenTelemetry's primary aim was to define a standard data model (like for <a href="https://opentelemetry.io/docs/specs/otel/metrics/data-model/">metrics</a> and <a href="https://opentelemetry.io/docs/specs/otel/logs/data-model/">logs</a>) and a transport protocol (<a href="https://opentelemetry.io/docs/specs/otel/protocol/">OTLP</a>).  It leaves us the flexibility to work with any authentication scheme, based on our collector pipeline and the backend we are using. </p><p>In a Collector pipeline, data has one point of entry, the <em>receivers</em> and one point of exit, the <em>exporters</em>. Authentication is critical at both of these points.</p><p></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!BUjn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!BUjn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 424w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 848w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1272w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png" width="1328" height="641" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:641,&quot;width&quot;:1328,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:720648,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://newsletter.signoz.io/i/176322568?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!BUjn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 424w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 848w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1272w, https://substackcdn.com/image/fetch/$s_!BUjn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff95201bc-c741-46b9-92b1-083a3b3907f2_1328x641.png 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption">Authenticating incoming and outgoing traffic</figcaption></figure></div><p></p><h3>Authenticating Incoming Traffic</h3><p>As we saw before, the receiver is the point of entry for data traffic, hence it&#8217;s crucial to examine if the data is coming from a <em>reliable</em> source. We achieve this by auth extensions. You can read more about<strong> <a href="https://opentelemetry.io/docs/collector/configuration/#extensions">extensions</a></strong> here. </p><p>In this scenario, we will configure our Collector to only accept requests that include a valid secret token in their Authorization: Bearer &lt;token&gt; header. This is a three-step process in your Collector&#8217;s config.yaml file.</p><p></p><h4><strong>Step 1: Define the Authenticator in extensions</strong></h4><p>First, we define our authentication method. We&#8217;ll use the built-in bearerauth authentication and provide it with a list of valid tokens.</p><pre><code>extensions:
   bearerauth:
   # This defines a list of valid secret tokens the collector will accept.
   # Any client request must present one of these tokens to be authenticated.
       tokens:
          &#8220;${CLIENT_A_TOKEN}&#8221;
          &#8220;${CLIENT_B_TOKEN}&#8221;

</code></pre><p>Just registering the authentication here under the extension doesn&#8217;t <em>enforce</em> it. It gets enforced when it&#8217;s applied to a receiver, as shown in the next section.</p><div><hr></div><p><strong>&#9888;&#65039; Important Security Note!</strong></p><p>Never hardcode secrets directly in your configuration file. The ${...} syntax tells the Collector to load the token from an environment variable. You should inject these variables securely using a tool like Kubernetes Secrets or Docker Secrets.</p><div><hr></div><h4><strong>Step 2: Apply the Authenticator to a Receiver</strong></h4><p>Next, we tell our otlpreceiver that it must use the authenticator we just defined. We do this by adding an auth setting within the receiver&#8217;s configuration.</p><pre><code>receivers:
  otlp:
    protocols:
      grpc:
        endpoint: &#8220;0.0.0.0:4317&#8221;
        auth:
          authenticator: bearerauth   # use the bearerauth extension
      http:
        endpoint: &#8220;0.0.0.0:4318&#8221;
        auth:
          authenticator: bearerauth   # same auth on HTTP</code></pre><p></p><h4><strong>Step 3: Enable the Extension in the Service Block</strong></h4><p>Finally, the extension must be activated for the Collector by listing it in the service section. This is the entire flow of code.</p><pre><code>service:
  extensions: [bearerauth]  # This activates the bearerauth extension
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch]
      exporters: [otlp]  
</code></pre><p>With this configuration in place, your Collector&#8217;s incoming traffic is now secure. Any request arriving at the OTLP receiver without a valid token will be rejected, ensuring only your trusted applications can send data into your observability pipeline.</p><p>There are other ways of authentication as well, like &#8212; basicauth, oidc, etc, depending on your particular use case. Now, let&#8217;s see how we deal with outgoing traffic. </p><p></p><h3>Securing Outgoing Traffic</h3><p>Exporters are the exit point for data leaving the collector. The next destination for your data is most likely an observability backend like SigNoz, and the collector often needs to authenticate itself to prove it has permission to send that data. Now, there are two ways to do this. </p><p>The easiest way is to add a headers section directly to your exporter&#8217;s configuration in your config.yaml. This tells the exporter to attach the specified headers (containing your secret key) to every outgoing request. The code is shown below,</p><pre><code>exporters:
  otlp:
    endpoint: &#8220;ingest.us.signoz.cloud:443&#8221;
    headers:
      # This header authenticates the Collector with the SigNoz backend
      signoz-ingestion-key: &#8220;${SIGNOZ_API_KEY}&#8221; # as env var
</code></pre><p>For more complex authentication, you can follow the same sequence of steps as we did for receivers above. That is, Step 1 - Define the Authenticator in Extensions, AND Step 2: Apply the Authenticator to an Exporter. At the end, we register the extension under exporters. Here&#8217;s the entire code sample.</p><pre><code>extensions:
  sigv4auth: ## a specialized authenticator for users of AWS.
    region: &#8220;us-east-1&#8221;
    service: &#8220;aoss&#8221;  

exporters:
  otlp:
    endpoint: &#8220;ingest.us.signoz.cloud:443&#8221;
    headers:
      signoz-ingestion-key: &#8220;${SIGNOZ_API_KEY}&#8221;

  otlphttp/aws:
    endpoint: &#8220;https://my-opensearch-domain.us-east1.aoss.amazonaws.com&#8221;
    auth:
      authenticator: sigv4auth


 service:
   extensions: [sigv4auth]  
   pipelines:
     traces:
       receivers: [otlp]
       processors: [batch]
       exporters: [otlp]  
</code></pre><p><br>In summary, for most backends that use a simple API key, the static headers setting is all you need. For more complex scenarios involving cloud provider IAM roles or OAuth2, we use Collector&#8217;s auth extensions.</p><p></p><h3>What&#8217;s next?</h3><p>Now that we&#8217;ve laid a foundation for securing data flowing into your OpenTelemetry collectors, you can get hands-on and experiment with different authentication methods to get a well-rounded idea. To read more on OpenTelemetry collectors and their various parts, this is a good <strong><a href="https://signoz.io/blog/opentelemetry-operator-complete-guide/">read</a></strong><a href="https://signoz.io/blog/opentelemetry-operator-complete-guide/">.</a>    </p><p>Next week, I&#8217;ll be back with another deep-dive, and until then, adeiu! &#128075;</p><div><hr></div><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! If you enjoyed reading this, stay tuned and subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><div><hr></div><p></p>]]></content:encoded></item><item><title><![CDATA[LLM Observability in the Wild - Why OpenTelemetry should be the Standard ]]></title><description><![CDATA[Building, debugging, and improving AI agents in production gets messy fast. So, what's the solution? Read on!]]></description><link>https://newsletter.signoz.io/p/llm-observability-in-the-wild-why</link><guid isPermaLink="false">https://newsletter.signoz.io/p/llm-observability-in-the-wild-why</guid><dc:creator><![CDATA[Pranay]]></dc:creator><pubDate>Sun, 12 Oct 2025 13:02:14 GMT</pubDate><enclosure url="https://substackcdn.com/image/youtube/w_728,c_limit/DPL35sYPGPU" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A few days ago I hosted a live conversation with Pranav, co-founder of Chatwoot, about issues his team was running into with LLM observability.</p><p>The short version: building, debugging, and improving AI agents in production gets messy fast. There&#8217;s multiple competing standards for default libraries for LLM observability. And many such libraries like OpenInference which claim to be based on OpenTelemetry don&#8217;t strictly adhere to it&#8217;s conventions. This introduces problems for users who are trying to get better observability across their stack.</p><p>Here&#8217;s a write-up of what we covered and what I think it means for anyone shipping LLM features into real products. Feel free to watch the complete video</p><div id="youtube2-DPL35sYPGPU" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;DPL35sYPGPU&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/DPL35sYPGPU?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>The Problem Emerges in Prod</strong></h2><p>Pranav and I go way back to our YC days in 2021, and it&#8217;s always interesting to see how our paths have evolved. Chatwoot has built something really compelling - an open-source customer support platform that unifies conversations across every channel you can imagine: live chat, email, WhatsApp, social media, you name it. All in a single dashboard.</p><p>But here&#8217;s where it gets interesting. They&#8217;ve built an AI agent called &#8220;Captain&#8221; that can work across all these channels. You build the logic once, and it can handle support queries whether they come through email, live chat, or WhatsApp. Pretty neat, right?</p><p>The problem started showing up in production in the most unexpected ways. Sometimes their AI would randomly respond in Spanish when it absolutely shouldn&#8217;t. Other times, responses just weren&#8217;t quite right, and they had no visibility into <em>why</em>.</p><h2><strong>The Quest for LLM Observability</strong></h2><p>This is where Pranav&#8217;s journey into LLM observability began, it mirrors what I&#8217;ve been seeing across many companies building LLM applications. You need to understand:</p><ul><li><p>What documents were retrieved for a RAG query?</p></li><li><p>Which tool calls were made?</p></li><li><p>What was the exact input and output at each step?</p></li><li><p>Why did the AI make certain decisions?</p></li></ul><p>Without this visibility, you&#8217;re essentially flying blind in production.</p><h2><strong>The Standards Problem</strong></h2><p>Here&#8217;s where things get really interesting, and frankly, frustrating. Pranav explored several solutions:</p><p><strong>OpenAI&#8217;s native tracing</strong> looked promising with rich, detailed traces showing guardrails, agent flows, and tool calls. But it&#8217;s tightly coupled to OpenAI&#8217;s agent framework. Also, it only provides traces as an atomic unit. If you want to filter spans based on attributes or just examine specific spans directly, you can&#8217;t do that.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!olOR!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!olOR!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 424w, https://substackcdn.com/image/fetch/$s_!olOR!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 848w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1272w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp" width="1456" height="787" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:787,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;OpenAI agent workflow traces&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="OpenAI agent workflow traces" title="OpenAI agent workflow traces" srcset="https://substackcdn.com/image/fetch/$s_!olOR!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 424w, https://substackcdn.com/image/fetch/$s_!olOR!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 848w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1272w, https://substackcdn.com/image/fetch/$s_!olOR!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F271e89c5-0500-4574-9818-3777c2c56e2d_3084x1666.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>OpenAI agent workflow traces</em></figcaption></figure></div><p><strong>New Relic</strong> was easy to integrate since they already use it, and it supports OpenTelemetry. But the UI required clicking through 5-6 layers just to see relevant information. Not ideal when you&#8217;re trying to debug production issues.</p><p><strong>Phoenix</strong> caught their attention because it follows the OpenInference standard, which provides much richer, AI-specific span types. You can easily filter for just LLM calls, tool calls, or agent spans. The traces are beautiful and informative.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H1Fj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 424w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 848w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1272w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Phoenix doesn't recognize OpenTelemetry span kinds&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Phoenix doesn't recognize OpenTelemetry span kinds" title="Phoenix doesn't recognize OpenTelemetry span kinds" srcset="https://substackcdn.com/image/fetch/$s_!H1Fj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 424w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 848w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1272w, https://substackcdn.com/image/fetch/$s_!H1Fj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8c9eeb0-da16-48ee-a69e-9b450d54548b_3100x1680.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Phoenix doesn&#8217;t recognize OpenTelemetry span kinds</em></figcaption></figure></div><p>But here&#8217;s the kicker: Chatwoot is primarily a Ruby on Rails shop, and guess what? No Ruby SDK for OpenInference. Moreover, Phoenix doesn&#8217;t completely adhere to OTel semantic conventions, so if you send it telemetry data directly via OpenTelemetry, it doesn&#8217;t recognize the type of spans, etc.</p><p>As shown in the example above, Phoenix doesn&#8217;t shows data sent with OpenTelemetry span kinds as <code>unknown</code>.</p><h2><strong>The OpenTelemetry vs OpenInference Divide</strong></h2><p>This is where the conversation got really technical and revealed a fundamental industry problem. There are essentially two standards emerging:</p><p><strong>OpenTelemetry</strong> is the industry standard. It has libraries for every language, it&#8217;s production-ready, and it&#8217;s widely adopted. But it was built for traditional applications, not AI workflows. It only supports basic span types: internal, server, client, producer, consumer. That&#8217;s it.</p><p><strong>OpenInference</strong> was created specifically for AI applications. It has rich span types like LLM, tool, chain, embedding, agent, etc. You can easily query for &#8220;show me all the LLM calls&#8221; or &#8220;what were all the tool executions.&#8221; But it&#8217;s newer, has limited language support, and isn&#8217;t as widely adopted.</p><p>The tragic part? OpenInference claims to be &#8220;OpenTelemetry compatible,&#8221; but as Pranav discovered, that compatibility is shallow. You can send OpenTelemetry format data to Phoenix, but it doesn&#8217;t recognize the AI-specific semantics and just shows everything as &#8220;unknown&#8221; spans.</p><h2><strong>The Ruby Problem Makes It Worse</strong></h2><p>For teams using languages like Ruby that don&#8217;t have direct OpenInference SDK support, this becomes even more challenging. Pranav had to choose between:</p><ol><li><p>Building an SDK from scratch for Ruby</p></li><li><p>Using OpenTelemetry and losing AI-specific insights</p></li><li><p>Switching to a different language stack just for AI observability (way tougher)</p></li></ol><p>None of these are great options.</p><h2><strong>Why we (still) bias to OpenTelemetry</strong></h2><p>At SigNoz we&#8217;re all-in on OpenTelemetry. One reason: OTel&#8217;s consistency enables out-of-the-box experiences across your <em>whole</em> stack. Example: we can auto-surface <strong><a href="https://signoz.io/docs/external-api-monitoring/overview/">external API</a></strong> usage and performance based on span kinds and attributes. When parts of the app send telemetry via non-OTel conventions, those views degrade.</p><p>Chatwoot lands similarly: their entire product already emits OTel. Pulling in a second telemetry standard just for LLMs fragments the picture and complicates how they go about observability. This also silos their observability into different products which makes it difficult to solves issues when they occur.</p><h2><strong>Takeaways for builders</strong></h2><ul><li><p><strong>Pick one telemetry backbone</strong> - If most of your app is OTel, prefer staying OTel-native for LLMs too, even if it means adding richer attributes until GenAI conventions catch up.</p></li><li><p><strong>LLM specific libraries</strong> - Even if you have to use LLM specific libraries like OpenInference, try to keep your usage as close to OpenTelemetry as possible so that you are aware what non-OTel attributes you are using which may break things.</p></li><li><p><strong>Follow OTel GenAI working group</strong> - There is active work happening in OTel <strong><a href="https://opentelemetry.io/blog/2024/otel-generative-ai/">Gen AI working group</a></strong>. Follow the work happening there and do share your use cases so that the standards which OpenTelemetry builds are able to cater to most common use cases.</p></li></ul><p>As the LLM space is still evolving rapidly, we as a community need to share our voices so that the standards are robust.</p><div><hr></div><h2><strong>What we&#8217;re doing at SigNoz</strong></h2><p>We&#8217;re continuing to invest in OpenTelemetry-native LLM observability so teams don&#8217;t have to choose between stability and clarity. Concretely, that means:</p><ul><li><p>Clear dashboards and traces when LLM calls are modeled using OTel spans/attributes. You can find examples and dashboards in our <strong><a href="https://signoz.io/docs/llm-observability/">LLM observability</a></strong> docs. Though we have also use LLM specific libraries like OpenInference in our docs (as they are still the easiest way for ppl to get started), we have kept the dashboards as close to OTel standards as possible. We also plan to actively update this as OTel GenAI semantic conventions become more mature.</p></li><li><p>Guidance and examples for popular frameworks (LangChain, LlamaIndex, etc.) on emitting OTel-friendly telemetry.</p></li><li><p>Build features leveraging OpenTelemetry semantic conventions so that you get great out-of-box experience in SigNoz and adhere to thoughtful defaults that keep your services, DBs, queues, and LLM agents&#8212;in one coherent picture.</p></li></ul><p>If you&#8217;re wrestling with these trade-offs, we&#8217;d love to hear what&#8217;s breaking for you and what &#8220;rich semantics&#8221; you actually use day-to-day.</p><div><hr></div><h2><strong>What next?</strong></h2><p>Huge thanks to Pranav for going deep, especially from the Ruby perspective. If you&#8217;re shipping AI features and care about operability, add your voice: push for richer GenAI semantics in OpenTelemetry, and share real traces (sanitized) that show what you need to see.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk!  If you want more interesting reads, feel free to subscribe!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p>]]></content:encoded></item><item><title><![CDATA[Query Builder v5 - Two Years of Technical Debt, 80 Closed Issues, and a Fundamental Rethinking]]></title><description><![CDATA[Read on to understand how we revamped our query builder!]]></description><link>https://newsletter.signoz.io/p/query-builder-v5-two-years-of-technical</link><guid isPermaLink="false">https://newsletter.signoz.io/p/query-builder-v5-two-years-of-technical</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 21 Sep 2025 13:40:09 GMT</pubDate><content:encoded><![CDATA[<p>In 2022, we had three different query interfaces. Logs had a custom search syntax with no autocomplete. Traces only had predefined filters - no query builder at all. Metrics had a raw PromQL input box where you'd paste queries from somewhere else and hope they worked.</p><p>Each system spoke a different language. An engineer debugging a production issue had to context-switch not just between data types, but between entirely different ways of thinking about queries.</p><p>When we built v3 in 2022, we thought we were solving this. We created a unified query builder - basically a UI wrapper around SQL. Count, group by, filter, limit. It worked well enough to get us from 2022 to 2024.</p><p>Turns out we were building with the wrong assumptions.</p><h2><strong>The v3/v4 Design Flaw That Took Two Years to Understand</strong></h2><p>We designed v3 around traces and metrics. In these data types, you rarely need complex boolean logic. Simple AND between conditions usually covers it.</p><p>But logs are different. When you're searching logs during an incident, you need expressions like:</p><pre><code><code>(node_name contains 'management' OR pod_name contains 'test')
AND NOT (status_code &gt;= 500)
</code></code></pre><p>v3 couldn't do this. No OR support. No complex boolean expressions. No parentheses for precedence.</p><p>This was a major limitation that blocked common use cases. Users were forced to learn ClickHouse SQL, write raw queries, and maintain them as our schemas evolved. We'd built a query builder that couldn't handle the queries users actually needed.</p><h2><strong>The Support Calls That Changed Our Philosophy</strong></h2><p>After four years of support calls, we noticed a pattern that surprised us.</p><p>Senior engineers - people with 5-10 years of experience - couldn't find features that seemed obvious to us. Take chronological ordering in logs. We had the feature, buried three clicks deep in v3 and v4. Users didn't just struggle to use it; they assumed we didn't support it at all.</p><p>During these calls, we'd watch them search for features, see their frustration, and realize: if you built it and know exactly where it is, everything seems obvious. But if senior engineers can't discover your features, those features don't exist.</p><p>For v5, we changed our approach. We decided to stop making decisions for users.</p><p>In v3/v4, we tried to be clever. We'd make assumptions about what users wanted, hide complexity to "simplify" the experience. These assumptions were often wrong and led to behavior that broke trust.</p><p>For v5, we set a new rule: if we must make a decision, it should be the least surprising one possible. And wherever possible, don't make the decision at all - let users control their experience.</p><h2><strong>The Architectural Reality: You Can't Ship a Query Builder in Isolation</strong></h2><p>When we started building v5, we quickly discovered that the query builder isn't just one component. It's how users interact with data across the entire product.</p><p>Think about the typical workflow: You write a query in the explorer to investigate an issue. Then you either:</p><ul><li><p>Save it as a dashboard panel to monitor the pattern</p></li><li><p>Create an alert to catch it next time</p></li><li><p>Switch between logs, traces, and metrics to correlate data</p></li></ul><p>This interconnection meant we couldn't ship v5 for just the explorer. A query written in the new format had to work everywhere. This forced us to rebuild:</p><ul><li><p>All three explorers (logs, traces, metrics)</p></li><li><p>Dashboard panel creation (including value panels that only exist in dashboards)</p></li><li><p>Alert creation flows</p></li><li><p>The underlying query API that powers all of these</p></li></ul><p>What started as "let's add OR support to the query builder" became a complete architectural overhaul.</p><h2><strong>The Technical Implementation</strong></h2><h3><strong>Full-Text Search That Works Like Google</strong></h3><p>The most common use case during an incident is that a user sends you an error message. In v3, you'd need to construct a query with the correct syntax. In v5, you just paste and search:</p><pre><code><code>"connection timeout in payment service"
</code></code></pre><p>Behind the scenes, we parse this into the appropriate query structure. But the user doesn't need to know that. They're debugging a problem, not learning a query language.</p><h3><strong>Complex Boolean Logic with Proper Precedence</strong></h3><p>The feature that was impossible in v3/v4 and forced users to write ClickHouse queries:</p><pre><code><code>(service_name = 'api' AND status_code &gt;= 500)
OR
(service_name = 'worker' AND error_message contains 'timeout')
</code></code></pre><p>This seems basic, but implementing it required rethinking our entire query structure. We needed to support arbitrary nesting, maintain precedence rules, and still provide autocomplete and suggestions at every level.</p><h3><strong>Cross-Source Query Portability</strong></h3><p>Queries are portable across data types. It&#8217;s one of the most powerful features that users don&#8217;t notice initially.</p><p>Write a query filtering for <code>service_name = 'api'</code> in logs. Copy it. Paste it in traces explorer. It works.</p><p>This seems simple, but the implementation is complex. Logs, traces, and metrics have:</p><ul><li><p>Different underlying table schemas</p></li><li><p>Different column names for similar concepts</p></li><li><p>Different valid operations</p></li></ul><p>We built an abstraction layer that translates queries between these contexts automatically. Users think in terms of their data, not our storage schema.</p><h3><strong>Performance at Scale: Instant Suggestions</strong></h3><p>When you're typing a query, you need suggestions immediately. But we're dealing with:</p><ul><li><p>Millions of unique field values</p></li><li><p>Multiple data sources</p></li><li><p>Complex hierarchical data structures</p></li></ul><p>We implemented:</p><ul><li><p>Smart caching that predicts what fields you'll query next</p></li><li><p>Progressive loading that shows the most relevant suggestions first</p></li><li><p>Query optimization that happens before we send anything to ClickHouse</p></li></ul><p>The result? An autocomplete that feels instant, even at scale.</p><h2><strong>The UX Debt We Finally Paid</strong></h2><p>Because we were touching every part of the query experience, we could finally address years of accumulated UX issues.</p><p><strong>Chronological ordering in logs:</strong> Moved from a hidden dropdown to a prominent toggle. Same capability, much better discoverability.</p><p><strong>Time aggregation controls:</strong> Previously buried in advanced settings, now directly visible. Users can switch from 1-minute to 5-second granularity with one click.</p><p><strong>Interval selection:</strong> Direct control over data granularity from 5 seconds to 1 hour. Why does this matter? During an incident, 30-second aggregation might smooth out the spike that's causing your problem. 5-second aggregation shows you exactly when things went wrong.</p><p>These weren't query builder features, but fixing them was essential to delivering a coherent experience. When engineers are debugging production issues at 2 AM, they shouldn't hunt for basic controls.</p><h2><strong>The Validation: Users Replacing ClickHouse Queries</strong></h2><p>We shipped v5 with a single changelog entry. No marketing campaign. No push to adopt it.</p><p>Within three weeks, the feedback started coming in. The one that stood out: a user telling us they'd replaced all their ClickHouse queries with Query Builder queries.</p><p>We didn't ask them to do this. They discovered that the query builder could now handle their complex cases, and they preferred it over raw SQL.</p><p>Why? Because with Query Builder:</p><ul><li><p>They don't need to learn ClickHouse SQL syntax</p></li><li><p>They don't need to update queries when we change schemas</p></li><li><p>They get autocomplete and validation</p></li><li><p>They can copy queries between different data types</p></li><li><p>They can share queries with team members who don't know SQL</p></li></ul><p>When users actively choose your abstraction over direct database access, you know you've built the right thing.</p><h2><strong>What We Couldn't Ship Yet: The Future of Cross-Signal Correlation</strong></h2><h3><strong>Subqueries: Correlating Across Signal Types</strong></h3><p>Imagine investigating an incident where you see 500 errors. Your hypothesis: high CPU usage caused the failures. Today, you check traces for errors, then separately check metrics for CPU usage, then try to mentally correlate the timings.</p><p>With subqueries (currently in development), you'll write:</p><pre><code><code>Show traces where:
status_code &gt;= 500
AND subquery(metrics: CPU_usage &gt; 80% for same service)

</code></code></pre><p>This requires real-time joining of traces and metrics data. The architecture is designed, the UI patterns are established. Implementation is next.</p><h3><strong>Cross-Source Joins: Unified Debugging Experience</strong></h3><p>Currently, logs and traces live in separate worlds. You can see that a trace has an error, and you can see related logs, but you can't query them together.</p><p>With joins (in design phase), you'll write:</p><pre><code><code>Show logs where:
JOIN traces ON trace_id
WHERE traces.duration &gt; 500ms

</code></code></pre><p>This unlocks debugging workflows that are impossible today. Find all logs related to slow traces. Show logs where the parent span had an error. Correlate log patterns with trace characteristics.</p><h2><strong>The Engineering Lesson: Technical Elegance Without Discoverability Is Worthless</strong></h2><p>After four years working on this product, countless support calls, and watching experienced engineers struggle with features I thought were obvious, the lesson is clear:</p><p>Your technical solution can be elegant. Your features can be powerful. But if users can't find and use them, they might as well not exist.</p><p>We could have the most sophisticated query engine in the world. But if an engineer investigating a production incident can't immediately figure out how to use it, we've failed.</p><p>Query Builder v5 isn't just about adding OR support or fixing bugs. It's about recognizing that during an incident, engineers shouldn't have to think about query syntax. They should think about their problem.</p><h2><strong>Where We Go From Here</strong></h2><p>We closed 80 issues with v5. We have 50+ more in the backlog.</p><p>But we're not planning a v6 mega-release. We designed v5's architecture to be extensible. The abstractions are correct. The patterns are established. Now we can ship incremental improvements without breaking changes.</p><p>Subqueries, joins, and the remaining enhancements will roll out as they're ready. No more two-year gaps between major improvements.</p><p>The query builder is no longer just a UI component. It's how engineers interact with their observability data. And for the first time, it's powerful enough that users are choosing it over writing raw SQL.</p><p>That's not just a technical achievement. That's validation that we finally understood the problem we were trying to solve.</p><p>Query Builder v5 is live in the latest release. <strong><a href="https://signoz.io/docs/userguide/query-builder-v5/">Check the documentation</a></strong> for detailed examples and capabilities.</p>]]></content:encoded></item><item><title><![CDATA[LangChain Observability: How to Monitor LLM Apps with OpenTelemetry (With Demo App)]]></title><description><![CDATA[In this practical guide, we will walk you through setting up observability for your Langchain application with OpenTelemetry.]]></description><link>https://newsletter.signoz.io/p/langchain-observability-how-to-monitor</link><guid isPermaLink="false">https://newsletter.signoz.io/p/langchain-observability-how-to-monitor</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 07 Sep 2025 14:02:09 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!15qZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>LangChain has become one of the most popular frameworks for building LLM-powered applications, making it easier to create agents that can reason, plan, and take actions. But like any production-grade AI app, LangChain agents can run into performance bottlenecks, hallucinations, or tool call failures. And without proper LangChain observability, it&#8217;s hard to know where things break down.</p><p>In this practical guide, we will walk you through setting up observability for your Langchain application with OpenTelemetry</p><p>, the open-source standard for generating telemetry data. We'll instrument a demo trip planner agent and show you how to visualize traces, token usage, and tool performance in SigNoz.</p><p>The trip planner agent helps users plan their travel itinerary by combining LLM reasoning with external services like flight ticket search, weather APIs, hotel booking engines, and nearby activity recommendations. By instrumenting it with OpenTelemetry, you can trace every step of the planning process, measure latency at each stage, and quickly debug issues that impact the user experience.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!RzQN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!RzQN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!RzQN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!RzQN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fba4fbfca-6d47-4219-a649-2282091d0702_1200x630.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Why LangChain Observability Matters</strong></h2><p>LangChain agents are essentially reasoning loops: the LLM takes user input, decides which tools to call, processes their results, and iterates until it arrives at an answer. In a trip planner agent app, this might look like:</p><ul><li><p>Calling a flights API to check availability.</p></li><li><p>Fetching hotel recommendations from a booking API.</p></li><li><p>Looking up weather forecasts to suggest the best travel window.</p></li><li><p>Stitching everything together into a coherent itinerary.</p></li></ul><p>This chain of reasoning is amazing when it works but if one tool call fails, takes too long, or returns garbage, the whole experience collapses. Without observability, you won&#8217;t know whether the problem was:</p><ul><li><p>A slow external API call.</p></li><li><p>An LLM misunderstanding the tool response.</p></li><li><p>The reasoning loop going in circles.</p></li></ul><p>Instrumentation with OpenTelemetry makes all of this visible from under the hood.</p><h2><strong>How OpenTelemetry and SigNoz can help</strong></h2><p><strong>What is OpenTelemetry?</strong></p><p><strong><a href="https://signoz.io/blog/what-is-opentelemetry/">OpenTelemetry</a></strong> (OTel) is an open-source observability framework that provides a unified way to collect telemetry data&#8212;traces, metrics, and logs&#8212;from across your application stack. It&#8217;s a CNCF project with support for multiple programming languages and a wide range of integrations. The beauty of OTel is that you instrument your code once, and you can send that data to any observability backend you choose without vendor lock-in.</p><p>For LangChain-based agents, this means you can capture detailed performance and error data for each stage of the reasoning process: LLM calls, tool invocations (like flights, hotels, weather, and activity search), and the orchestration logic that stitches them together. Instead of treating your agent as a black box, you get fine-grained visibility into exactly how requests flow through your system.</p><p><strong>What is SigNoz?</strong></p><p><strong><a href="https://signoz.io/">SigNoz</a></strong> is an all-in-one observability platform built on top of OpenTelemetry. It provides a rich UI to visualize traces, monitor performance metrics, and set alerts all in real time. With SigNoz, you can drill into slow external API calls, trace a single trip planning request end-to-end, or quickly identify where your LangChain agent might be looping or failing.</p><p>By pairing OpenTelemetry&#8217;s standardized data collection with SigNoz&#8217;s powerful analysis tools, you get a complete observability stack tailored for modern, distributed, and AI-driven applications.</p><p>To demonstrate how OpenTelemetry and SigNoz work together in practice, we&#8217;ll walk through a demo trip planner agent built on LangChain. The agent uses flight search, hotel booking, weather APIs, and nearby activity lookup to craft travel itineraries, and with observability enabled, you can see every step of the process in action.</p><p></p><div class="captioned-button-wrap" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="CaptionedButtonToDOM"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! This post is public so feel free to share it.</p></div><p class="button-wrapper" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share&quot;,&quot;text&quot;:&quot;Share&quot;}" data-component-name="ButtonCreateButton"><a class="button primary" href="https://newsletter.signoz.io/p/langchain-observability-how-to-monitor?utm_source=substack&utm_medium=email&utm_content=share&action=share"><span>Share</span></a></p></div><p></p><h2><strong>Building the Example App: A LangChain Trip Planner Agent</strong></h2><p>To make this guide more concrete, we&#8217;ve built a trip planner agent powered by LangChain, OpenTelemetry, and SigNoz. The idea is simple: the user specifies a start location, destination, and check-in/check-out dates, and the agent generates a personalized travel itinerary.</p><p>The itinerary includes:</p><ul><li><p><strong>Flight details</strong> for departure and return.</p></li><li><p><strong>Hotel booking options</strong> covering the entire stay.</p></li><li><p><strong>Weather forecasts</strong> for the chosen dates.</p></li><li><p><strong>Nearby activities</strong> to explore at the destination.</p></li></ul><p>Under the hood, the app uses LangChain&#8217;s agent framework to orchestrate multiple tool calls: one for flight tickets, one for hotels, one for weather, and one for activities. The LLM reasons over the responses from these tools and stitches them together into a coherent itinerary.</p><p>With OpenTelemetry instrumentation baked in, every tool invocation and LLM call is traced and sent to SigNoz, providing a complete picture of the app&#8217;s performance and behavior: whether a flight API is slow, a hotel lookup fails, or the agent loops unnecessarily.</p><p>To make it more interactive, the trip planner also includes a chatbot feature. Users can ask follow-up questions like <em>&#8220;Can you find vegetarian-friendly restaurants near my hotel?&#8221;</em> or <em>&#8220;What&#8217;s the best day trip outside the city?&#8221;</em> These conversations are also traced, helping developers see how the agent performs during exploratory dialogue.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!EFLN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!EFLN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 424w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 848w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1272w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp" width="1334" height="1166" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1166,&quot;width&quot;:1334,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent App Image" title="Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!EFLN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 424w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 848w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1272w, https://substackcdn.com/image/fetch/$s_!EFLN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb6b6216f-da05-4cc0-80d7-4f016a6bc60f_1334x1166.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain App Starting Page</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!6XGQ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 424w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 848w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1272w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp" width="1456" height="1259" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1259,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Agent App Chat&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Agent App Chat" title="Agent App Chat" srcset="https://substackcdn.com/image/fetch/$s_!6XGQ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 424w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 848w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1272w, https://substackcdn.com/image/fetch/$s_!6XGQ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd8e014d3-8fba-4a63-a0e1-c7947c1cd2a6_1866x1614.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain App Interactions</em></figcaption></figure></div><h2><strong>Try the Trip Planner Agent Yourself</strong></h2><p>Want to explore the LangChain Trip Planner in action? Clone the <strong><a href="https://github.com/SigNoz/langchain-monitoring-demo/tree/main">repo</a></strong>, install dependencies, and follow the setup steps in the README to start sending traces from your local app to SigNoz.</p><pre><code><code>git clone https://github.com/SigNoz/langchain-monitoring-demo.git
</code></code></pre><p>After cloning the repo, you can run the agent locally and start exploring and creatign travel plans. The <strong><a href="https://github.com/SigNoz/langchain-monitoring-demo/blob/main/README.md">README</a></strong> provides step&#8209;by&#8209;step guidance for setting up the demo. If you&#8217;d rather instrument your own LangChain app, continue to the next section for detailed instructions on integrating OpenTelemetry and SigNoz.</p><h2><strong>Instrument your LangChain application</strong></h2><h3><strong>Prerequisites</strong></h3><ul><li><p>A Python application using <strong>Python 3.8+</strong></p></li><li><p>LangChain integrated into your app</p></li><li><p>Basic understanding of AI Agents and tool calling workflow</p></li><li><p>A <strong><a href="https://signoz.io/teams/">SigNoz Cloud account</a></strong> with an active ingestion key</p></li><li><p><code>pip</code> installed for managing Python packages</p></li><li><p>Internet access to send telemetry data to SigNoz Cloud</p></li><li><p><em>(Optional but recommended)</em> A Python virtual environment to isolate dependencies</p></li></ul><p>To capture detailed telemetry from LangChain without modifying your core application logic, we will use <strong><a href="https://arize.com/docs/ax/learn/tracing-concepts/what-is-openinference">OpenInference</a></strong>, a community-driven standard designed to make observability in AI applications easier. It provides pre-built instrumentation for popular frameworks like LangChain, and it&#8217;s built on top of the trusted OpenTelemetry ecosystem. This allows you to trace your LangChain application with minimal configuration.</p><p>Check out detailed instructions on how to set up OpenInference instrumentation in your LangChain application over <strong><a href="https://pypi.org/project/openinference-instrumentation-langchain/">here</a></strong>.</p><p><strong>Step 1:</strong> Install OpenInference and OpenTelemetry related packages</p><pre><code><code>pip install openinference-instrumentation-langchain \
opentelemetry-exporter-otlp \
opentelemetry-sdk
</code></code></pre><p><strong>Step 2:</strong> Import the necessary modules in your Python application</p><pre><code><code>from opentelemetry.sdk.trace import TracerProvider
from opentelemetry.sdk.trace.export import BatchSpanProcessor
from opentelemetry.sdk.resources import Resource
from opentelemetry.exporter.otlp.proto.http.trace_exporter import OTLPSpanExporter
from openinference.instrumentation.langchain import LangChainInstrumentor
</code></code></pre><p><strong>Step 3:</strong> Set up the OpenTelemetry Tracer Provider to send traces directly to SigNoz Cloud</p><pre><code><code>resource = Resource.create({"service.name": "&lt;service_name&gt;"})
provider = TracerProvider(resource=resource)
span_exporter = OTLPSpanExporter(
    endpoint="https://ingest.&lt;region&gt;.signoz.cloud:443/v1/traces",
    headers={"signoz-ingestion-key": "&lt;your-ingestion-key&gt;"},
)
provider.add_span_processor(BatchSpanProcessor(span_exporter))
</code></code></pre><ul><li><p><code>&lt;service_name&gt;</code> is the name of your service</p></li><li><p>Set the <code>&lt;region&gt;</code> to match your SigNoz Cloud <strong><a href="https://signoz.io/docs/ingestion/signoz-cloud/overview/#endpoint">region</a></strong></p></li><li><p>Replace <code>&lt;your-ingestion-key&gt;</code> with your SigNoz <strong><a href="https://signoz.io/docs/ingestion/signoz-cloud/keys/">ingestion key</a></strong></p></li></ul><p><strong>Step 4:</strong> Instrument LangChain using OpenInference</p><p>Use the <code>LangChainInstrumentor</code> from OpenInference to automatically trace LangChain operations with your OpenTelemetry setup:</p><pre><code><code>LangChainInstrumentor().instrument()
</code></code></pre><blockquote><p><em><strong>&#128204; Important: Place this code at the start of your application logic &#8212; before any LangChain functions are called or used &#8212; to ensure telemetry is correctly captured.</strong></em></p></blockquote><p>Your LangChain commands should now automatically emit traces, spans, and attributes.</p><p>Finally, you should be able to view this data in Signoz Cloud under the traces tab:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!wFuj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!wFuj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 424w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 848w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1272w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp" width="1456" height="136" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:136,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Traces View&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Traces View" title="Traces View" srcset="https://substackcdn.com/image/fetch/$s_!wFuj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 424w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 848w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1272w, https://substackcdn.com/image/fetch/$s_!wFuj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ab168e0-3939-4e27-bdf2-335d0c12958c_2224x208.webp 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a><figcaption class="image-caption"><em>Traces of your LangChain Application</em></figcaption></figure></div><p>When you click on a trace ID in SigNoz, you'll see a detailed view of the trace, including all associated spans, along with their events and attributes:</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iZLL!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iZLL!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 424w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 848w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp" width="1456" height="767" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:767,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Detailed Traces View&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Detailed Traces View" title="Detailed Traces View" srcset="https://substackcdn.com/image/fetch/$s_!iZLL!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 424w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 848w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iZLL!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F983fdef4-778a-42a7-a1fa-fd2ab98136f7_2886x1520.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Detailed traces view of your LangChain Application</em></figcaption></figure></div><h2><strong>Making Sense of Your Telemetry Data</strong></h2><p>Once telemetry is enabled in our LangChain trip planner agent, we start seeing detailed traces for each major step in the reasoning workflow. With LangGraph integration, these traces are neatly structured, showing how the agent loop orchestrates model calls and tool invocations. Here are three example spans you&#8217;ll encounter:</p><p><strong>LangGraph (root span)</strong></p><p>The overarching span represents the full request lifecycle of the trip planner agent. From the moment a user asks for a travel itinerary, every downstream operation: LLM reasoning, tool calls, and response generation is captured inside this parent span.</p><p>This view makes it clear how long the entire request took. On the right panel, you can explore input values like the initial user query, making it easy to trace back how the request was interpreted at the start.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Bmtl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Root Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Root Span" title="Root Span" srcset="https://substackcdn.com/image/fetch/$s_!Bmtl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Bmtl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fab241458-be91-40ff-b469-116707ea48fe_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Langraph Root Span</em></figcaption></figure></div><p><strong>Agent span</strong></p><p>Nested inside the LangGraph span is the agent span, which captures the LLM&#8217;s reasoning steps. This includes the decision-making process: when to call a tool, how to interpret the results, and whether the loop should continue or terminate.</p><p>Here, you can see the <code>call_model &#8594; RunnableSequence &#8594; ChatOpenAI</code> flow. Each step shows its latency, and the trace reveals exactly which prompts and tool inputs the agent generated. This makes it much easier to debug cases where the model loops too long or misuses a tool.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!oReK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!oReK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!oReK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Pre-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pre-Tool Agent Span" title="Pre-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!oReK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!oReK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!oReK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd64d325a-5857-4ae5-8658-5b252a0c18c1_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Pre-Tool Call Agent Span</em></figcaption></figure></div><p><strong>Tool call spans</strong></p><p>Next, you&#8217;ll see spans for each tool invocation: flights, hotels, weather, and activities. These are especially valuable for diagnosing external API performance.</p><p>For example:</p><ul><li><p><code>get_flight_tickets</code> &#8594; duration ~13ms</p></li><li><p><code>get_hotel_bookings</code> &#8594; duration ~25ms</p></li><li><p><code>get_weather</code> &#8594; duration ~16ms</p></li><li><p><code>get_activities</code> &#8594; duration ~11ms</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!WFDK!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!WFDK!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Calls Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Calls Span" title="Tool Calls Span" srcset="https://substackcdn.com/image/fetch/$s_!WFDK!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!WFDK!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F37051c95-49ac-4b56-ace6-d20e6746d50e_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Calls Span</em></figcaption></figure></div><p><strong>Closing Agent span</strong></p><p>After the tool calls, the workflow enters a closing agent span, where the LLM takes all tool outputs (flights, hotels, weather, activities) and composes the final travel itinerary.</p><p>This is where the agent stitches together structured API responses into a user-friendly itinerary. By inspecting this span, you can:</p><ul><li><p>Review the exact prompt the LLM used to summarize tool outputs.</p></li><li><p>Measure how much time the final response generation takes.</p></li><li><p>Verify the final message content before it&#8217;s returned to the user.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pr7q!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp" width="1456" height="761" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:761,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Post-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Post-Tool Agent Span" title="Post-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!Pr7q!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 424w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 848w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pr7q!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4143dd54-42c5-4abe-9cc6-c9b5a0417bd8_2914x1524.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Post-Tool Call Agent Span</em></figcaption></figure></div><p>With all this data, you can answer critical performance questions about your trip planner agent:</p><ul><li><p><strong>Where is the time going?</strong> Is most of the latency in the agent&#8217;s reasoning, external API calls, or final response assembly?</p></li><li><p><strong>Which tools are slowest?</strong> For instance, if <code>get_hotel_bookings</code> consistently takes longer, you might need caching or a faster API provider.</p></li><li><p><strong>Is the agent reasoning efficiently?</strong> If the initial or closing agent spans dominate total latency, you may need to optimize prompts or reduce unnecessary loops.</p></li></ul><p>Instead of guessing why an itinerary takes 20+ seconds to generate, SigNoz gives you a connected, end-to-end view of each request turning your LangChain workflow from a black box into a fully observable system.</p><h2><strong>Visualizing Data in SigNoz with Dashboards</strong></h2><p>Once your LangChain trip planner agent is instrumented with OpenTelemetry, SigNoz gives you teh ability to create rich dashboards to explore the emitted telemetry data. Built-in filters and span attributes make it easy to drill down into agent reasoning latency, tool performance, or model usage. This gives you a real-time pulse on how your application is performing end-to-end.</p><p>Here are some of the most insightful panels we built using the traces from our instrumented trip planner workflow:</p><p><strong>p95 Duration for Agent </strong><code>call_model</code></p><p>This panel shows the 95th percentile latency for the LLM calls made by the agent. Since LLM generation is often the longest-running step, tracking p95 duration helps you identify worst-case response times and tune prompts or model choices to improve user experience.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!_Vnh!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 424w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 848w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp" width="1274" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:1274,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;call_model duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="call_model duration" title="call_model duration" srcset="https://substackcdn.com/image/fetch/$s_!_Vnh!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 424w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 848w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!_Vnh!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0c4b3cda-5eaa-4898-988f-826f52d3a009_1274x714.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>call_model Duration Panel</em></figcaption></figure></div><p><strong>Tool Call Distribution</strong></p><p>This panel visualizes how often different tools&#8212;flights, hotels, weather, and activities&#8212;are invoked across all trip planning sessions. It gives you a clear sense of workload distribution: for example, hotel searches may dominate requests while activity lookups are used less frequently. Understanding this helps with capacity planning and prioritizing optimizations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!OLiV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!OLiV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 424w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 848w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1272w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp" width="996" height="1020" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1020,&quot;width&quot;:996,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Distribution" title="Tool Distribution" srcset="https://substackcdn.com/image/fetch/$s_!OLiV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 424w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 848w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1272w, https://substackcdn.com/image/fetch/$s_!OLiV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F0f4c5918-01cb-4cc4-883d-46a1f76b05dc_996x1020.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Distribution Panel</em></figcaption></figure></div><p><strong>Input and Output Token Usage</strong></p><p>This panel tracks the total number of input and output tokens processed by the LLM over time. Input tokens include the user query and tool outputs passed to the model, while output tokens are the generated itineraries or chatbot replies. Monitoring this helps you manage costs, optimize prompt length, and detect patterns in response verbosity.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!sZdl!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!sZdl!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 424w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 848w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp" width="694" height="714" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:714,&quot;width&quot;:694,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Token Usage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Token Usage" title="Token Usage" srcset="https://substackcdn.com/image/fetch/$s_!sZdl!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 424w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 848w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1272w, https://substackcdn.com/image/fetch/$s_!sZdl!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F71eddcb8-e34c-46e6-8833-2090cff4aa84_694x714.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>I/O Token Usage Panel</em></figcaption></figure></div><p><strong>p95 Duration of Each Tool Call</strong></p><p>This panel breaks down the latency of each tool: <code>get_flight_tickets</code>, <code>get_hotel_bookings</code>, <code>get_weather</code>, and <code>get_activities</code>. By tracking the 95th percentile duration, you can quickly spot which external API is the slowest under peak load and decide whether caching, retries, or provider changes are needed.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!15qZ!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!15qZ!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 424w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 848w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1272w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp" width="1456" height="992" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:992,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Duration" title="Tool Duration" srcset="https://substackcdn.com/image/fetch/$s_!15qZ!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 424w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 848w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1272w, https://substackcdn.com/image/fetch/$s_!15qZ!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa803c413-d8fe-4906-82cc-dba559f3a479_1834x1250.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Durations Panel</em></figcaption></figure></div><p><strong>LLM Model Distribution</strong></p><p>If your app is configured to use multiple LLMs, this panel shows the distribution of model usage. It&#8217;s useful for analyzing trade-offs between speed, quality, and cost. For example, you might run most queries on a smaller, cheaper model but switch to a larger model for complex multi-step itineraries.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!epAn!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!epAn!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 424w, https://substackcdn.com/image/fetch/$s_!epAn!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 848w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1272w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp" width="1190" height="1090" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1090,&quot;width&quot;:1190,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Model Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Model Distribution" title="Model Distribution" srcset="https://substackcdn.com/image/fetch/$s_!epAn!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 424w, https://substackcdn.com/image/fetch/$s_!epAn!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 848w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1272w, https://substackcdn.com/image/fetch/$s_!epAn!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff33ca639-2616-4518-8d2f-c715d7d09c86_1190x1090.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LLM Model Distribution Panel</em></figcaption></figure></div><p>With these dashboards in place, you can move beyond anecdotal debugging and gain data-driven insights into your LangChain agent. Whether it&#8217;s latency hotspots, tool reliability, or token usage trends, SigNoz provides the observability foundation you need to scale AI-driven trip planning with confidence.</p><h2><strong>Wrapping it Up</strong></h2><p>Building LangChain agents like a trip planner is exciting. There&#8217;s something magical about watching an AI plan your flights, hotels, activities, and even answer follow-up questions in natural language. But that magic only lasts if the app stays fast, reliable, and trustworthy. To make that happen, you need a clear view of what&#8217;s going on under the hood.</p><p>By pairing OpenTelemetry&#8217;s vendor-neutral instrumentation with SigNoz&#8217;s powerful observability platform, you can follow every step of your LangChain workflow from agent reasoning to tool calls and final response generation. With this visibility, debugging becomes faster, performance tuning becomes data-driven, and your users get consistently great experiences.</p><p>In AI-powered apps, guesswork is the enemy. Observability is how you replace it with clarity, and that&#8217;s how you build LangChain systems you can trust.</p><h2><strong>Coming Next: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP</strong></h2><p>In the <strong><a href="https://signoz.io/blog/monitoring-langchain-agent-querying-signoz-mcp-server/">next part</a></strong> of this series, we&#8217;ll go deeper into observability by looking at a LangChain agent that integrates with an MCP (Model Context Protocol) server. This opens up richer interactions, but also more moving parts where observability becomes even more critical.</p><p><strong><a href="https://signoz.io/blog/monitoring-langchain-agent-querying-signoz-mcp-server/">Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP</a></strong></p>]]></content:encoded></item><item><title><![CDATA[Full-Circle Observability: Using SigNoz to monitor a LangChain agent that queries SigNoz MCP]]></title><description><![CDATA[Let's explore how to instrument a LangChain trip planner agent with OpenTelemetry and send telemetry data to SigNoz.]]></description><link>https://newsletter.signoz.io/p/full-circle-observability-using-signoz</link><guid isPermaLink="false">https://newsletter.signoz.io/p/full-circle-observability-using-signoz</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 31 Aug 2025 14:23:13 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!84Qv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In <strong><a href="https://signoz.io/blog/langchain-observability-with-opentelemetry/">Part 1</a></strong> of this series, we explored how to instrument a LangChain trip planner agent with OpenTelemetry and send telemetry data to SigNoz. By tracing each step of the planning process: LLM reasoning, tool calls for flights, hotels, weather, and activities, and the final itinerary response, we saw how observability turns a black-box agent workflow into a transparent, debuggable system.</p><p>That foundation gave us insights into latency hotspots, tool failures, and agent reasoning loops which are all critical for ensuring a reliable user experience in production AI apps.</p><p>In this second part, we&#8217;ll take observability a step further by introducing MCP (Model Context Protocol) servers into the mix. Specifically, we&#8217;ll look at a LangChain agent integrated with a SigNoz MCP server, which allows the agent to directly query logs, metrics, and traces from a connected SigNoz instance.</p><p>This means that instead of just sending observability data to SigNoz, the agent itself can consume and reason over observability data in real time.</p><p>We&#8217;ll walk through how to set up a SigNoz MCP agent with LangChain, instrument it with OpenTelemetry, and explore the kinds of insights it can surface when observability data becomes part of the agent&#8217;s context.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZKTd!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp" width="1200" height="630" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:630,&quot;width&quot;:1200,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!ZKTd!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 424w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 848w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZKTd!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4b81002d-5f15-46e8-8b51-56eb3d87cff0_1200x630.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a></figure></div><h2><strong>Building the Example App: A LangChain SigNoz MCP Agent</strong></h2><p>In this part, we&#8217;ll demonstrate a LangChain agent that integrates with a SigNoz MCP (Model Context Protocol) server. The goal of this app is to make observability data: logs, metrics, and traces queryable through natural language.</p><p>Users can interact with the agent just like a chatbot, asking operational and performance-related questions such as:</p><ul><li><p><em>&#8220;What are all the active services in the last 5 hours&#8221;</em></p></li><li><p><em>&#8220;Which service has the highest error rate this week?&#8221;</em></p></li><li><p><em>&#8220;Show me the logs generated in the last 1 hour.&#8221;</em></p></li></ul><p>Behind the scenes, the LangChain agent communicates with the SigNoz MCP server, which exposes endpoints for querying observability data. The agent decides which endpoint to call (logs, metrics, or traces), retrieves the relevant data, and then uses the LLM to generate a clear, human-readable summary for the user.</p><p>All of this activity is itself instrumented with OpenTelemetry. Each agent reasoning step, MCP server call, and final response generation is captured as spans and sent to SigNoz.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!vv3d!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!vv3d!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/a5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP Agent App Image" title="MCP Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!vv3d!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!vv3d!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fa5c0722e-d6ca-4688-9949-eca377244f30_2470x1470.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Chat</em></figcaption></figure></div><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Tht0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Tht0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp" width="1456" height="867" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:867,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP Agent App Image&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP Agent App Image" title="MCP Agent App Image" srcset="https://substackcdn.com/image/fetch/$s_!Tht0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 424w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 848w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1272w, https://substackcdn.com/image/fetch/$s_!Tht0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F94074bee-4d70-4678-9ce4-f9bdd5110afb_2470x1470.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Chat</em></figcaption></figure></div><h2><strong>Try the SigNoz MCP Agent Yourself</strong></h2><p>Want to explore the LangChain SigNoz MCP Agent in action? Clone the <strong><a href="https://github.com/SigNoz/signoz-mcp-demo">repo</a></strong>, install dependencies, and follow the setup steps in the <strong><a href="https://github.com/SigNoz/signoz-mcp-demo/blob/main/README.md">README</a></strong> to connect the agent with your own SigNoz instance.</p><pre><code><code>git clone https://github.com/SigNoz/signoz-mcp-demo.git
</code></code></pre><p>After cloning the repo, you can run the agent locally and start asking natural language questions about your observability data&#8212;logs, metrics, and traces&#8212;from SigNoz.</p><p>The README provides step-by-step guidance for configuring the MCP server connection and running the demo.</p><h2><strong>Making Sense of Your Telemetry Data</strong></h2><p>Once telemetry is enabled for the SigNoz MCP agent, traces clearly show how a user request flows through LangGraph, the agent&#8217;s reasoning, the MCP tool invocation, and the final response assembly. In a typical run, you&#8217;ll see this shape:</p><p><code>query_endpoint</code><strong> (root span)</strong></p><p>This top-level span represents the entire MCP query lifecycle from the user&#8217;s natural-language prompt to the final summarized answer. It&#8217;s your single place to track end-to-end latency for an observability question, containing the LangGraph from our previous blog.</p><p>Use the right-hand attributes to confirm request metadata and inspect the input/output payloads that kicked off the flow.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!LtLH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!LtLH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/ff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Root Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Root Span" title="Root Span" srcset="https://substackcdn.com/image/fetch/$s_!LtLH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!LtLH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fff8d543f-09db-4c7b-84f1-f516e24bce70_2900x1506.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>LangChain MCP App Root Span</em></figcaption></figure></div><p><strong>Initial Agent span (planning &amp; tool selection)</strong></p><p>Nested under the root is the first agent span. Here the LLM interprets the user&#8217;s question and decides which MCP capability to call (logs, metrics, or traces). In your example, the chain shows:</p><p><code>call_model &#8594; RunnableSequence &#8594; ChatOpenAI &#8594; should_continue</code></p><p>This span&#8217;s duration is a good proxy for prompt complexity and reasoning cost before any external call happens.</p><p>What to look for:</p><ul><li><p>Long initial agent spans can indicate heavy prompts or unnecessary loops.</p></li><li><p>Inputs/outputs show the exact messages the model which is great for debugging misinterpretations.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0yTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Pre-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Pre-Tool Agent Span" title="Pre-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Pre-Tool Call Agent Span</em></figcaption></figure></div><p><strong>MCP tool span (data retrieval from SigNoz)</strong></p><p>Next comes the MCP tool call. For example, a <code>fetch_services</code> operation hitting the SigNoz MCP server to retrieve services, metrics, logs, or traces. This is the place to diagnose backend/query latency and payload size issues.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ncvj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ncvj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp" width="1456" height="756" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:756,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Calls Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Calls Span" title="Tool Calls Span" srcset="https://substackcdn.com/image/fetch/$s_!ncvj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 424w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 848w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1272w, https://substackcdn.com/image/fetch/$s_!ncvj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc2d728e9-3480-4c4c-9fbb-3c47542f6314_2900x1506.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Calls Span</em></figcaption></figure></div><p><strong>Closing Agent span (reasoning &amp; final answer)</strong></p><p>After the tool response, the closing agent span composes the final answer: it parses MCP results, filters/sorts/aggregates as needed, and generates a clean natural-language summary.</p><p>What to look for:</p><ul><li><p>Long closing spans usually mean large MCP payloads being summarized (token pressure) or extra follow-up reasoning.</p></li><li><p>Inspect the prompt the agent used for summarization to ensure it&#8217;s concise and grounded in the retrieved data.</p></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0yTS!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp" width="1456" height="762" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:762,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Post-Tool Agent Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Post-Tool Agent Span" title="Post-Tool Agent Span" srcset="https://substackcdn.com/image/fetch/$s_!0yTS!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 424w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 848w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1272w, https://substackcdn.com/image/fetch/$s_!0yTS!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F75a781da-f8b9-456f-ae77-17b2cd082dec_2900x1518.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Post-Tool Call Agent Span</em></figcaption></figure></div><h3><strong>Handling Errors with Full Context</strong></h3><p>Errors are inevitable in AI agents: API limits, bad tool responses, or timeouts. Without observability, it&#8217;s hard to know <em>what failed</em> and <em>where</em>.</p><p>With SigNoz, errors are tied to specific spans in the trace, so you can see:</p><ul><li><p><strong>Which component failed</strong> (agent reasoning, tool call, or response synthesis).</p></li><li><p><strong>What the error was</strong> (rate limit, timeout, schema mismatch, etc.).</p></li><li><p><strong>When in the request it happened</strong>.</p></li></ul><p>In this example, a RateLimitError from OpenAI is clearly flagged in the closing agent span. The trace shows the error message, stack trace, and context all in one place.</p><p>Instead of guessing, you know exactly what broke, where, and why, making debugging much faster and safer in production.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!rOp0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!rOp0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 424w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 848w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1272w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp" width="1456" height="768" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/fdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:768,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Error Span&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Error Span" title="Error Span" srcset="https://substackcdn.com/image/fetch/$s_!rOp0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 424w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 848w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1272w, https://substackcdn.com/image/fetch/$s_!rOp0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ffdab28b0-ddc5-4962-b8e7-6e6ba056c96e_2900x1530.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Token Limit Error Span</em></figcaption></figure></div><p><strong>What you can answer with these traces:</strong></p><ul><li><p><strong>Where is the latency?</strong></p><p>Is time spent in planning (initial agent), in the MCP query (tool span), or in summarization (closing agent)?</p></li><li><p><strong>Are queries efficient?</strong></p><p>Tool spans reveal slow MCP endpoints and overly broad filters. Tighten time windows or add constraints.</p></li><li><p><strong>Is the model working too hard?</strong></p><p>Long agent spans (before or after tools) suggest prompt bloat, unnecessary loops, or passing too much raw data back to the LLM.</p></li><li><p><strong>Is the workflow stable?</strong></p><p>Use span status codes and events to spot intermittent errors (schema mismatches, token limits, provider hiccups).</p></li></ul><p>With this structure, SigNoz turns the MCP-powered workflow from a black box into a fully traceable conversation: <strong>user prompt &#8594; agent planning &#8594; MCP tool call &#8594; agent summary</strong>. That visibility makes debugging faster, optimization data-driven, and your observability assistant consistently reliable.</p><h2><strong>Visualizing Data in SigNoz with Dashboards</strong></h2><p>Once your LangChain SigNoz MCP agent is instrumented with OpenTelemetry, SigNoz gives you the ability to create rich dashboards to explore emitted telemetry data. Built-in filters and span attributes make it easy to drill down into agent reasoning latency, MCP query performance, error patterns, and model usage. This provides a real-time pulse on how your observability agent itself is performing end-to-end.</p><p>Here are some insightful panels we built using the traces from our instrumented MCP workflow:</p><p><strong>p95 Duration for Agent </strong><code>call_model</code></p><p>This panel shows the 95th percentile latency for LLM calls made by the agent. Since generation often dominates total response time, monitoring p95 latency highlights worst-case scenarios and helps you optimize prompts, reduce context size, or adjust model selection.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!mIbC!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!mIbC!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 424w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 848w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1272w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp" width="1198" height="758" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:758,&quot;width&quot;:1198,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;call_model duration&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="call_model duration" title="call_model duration" srcset="https://substackcdn.com/image/fetch/$s_!mIbC!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 424w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 848w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1272w, https://substackcdn.com/image/fetch/$s_!mIbC!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F6ac201a9-3303-404c-955a-d5e832a0955d_1198x758.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>call_model Duration Panel</em></figcaption></figure></div><p><strong>MCP Tool Call Distribution</strong></p><p>This panel visualizes how often the agent queries different MCP endpoints: logs, metrics, or traces. It gives you a sense of workload distribution, showing whether users are primarily asking about latency, error logs, or trace investigations.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!84Qv!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!84Qv!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 424w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 848w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1272w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp" width="1000" height="930" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/b5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:930,&quot;width&quot;:1000,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Tool Distribution&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Tool Distribution" title="Tool Distribution" srcset="https://substackcdn.com/image/fetch/$s_!84Qv!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 424w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 848w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1272w, https://substackcdn.com/image/fetch/$s_!84Qv!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fb5130ade-fd28-45df-990e-321b87ee83c5_1000x930.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Tool Call Distribution Panel</em></figcaption></figure></div><p><strong>Input and Output Token Usage</strong></p><p>This panel tracks the total number of input and output tokens processed by the LLM over time. Input tokens include user queries and MCP responses passed into the model, while output tokens are the agent&#8217;s natural language answers. Monitoring this helps manage cost and detect patterns in verbosity or context expansion.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Wv2t!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 424w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 848w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1272w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp" width="704" height="1102" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/def37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1102,&quot;width&quot;:704,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Token Usage&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Token Usage" title="Token Usage" srcset="https://substackcdn.com/image/fetch/$s_!Wv2t!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 424w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 848w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1272w, https://substackcdn.com/image/fetch/$s_!Wv2t!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fdef37156-1e88-4890-9d86-587e8b16d63b_704x1102.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>I/O Total Token Usage</em></figcaption></figure></div><p><strong>Model Call Error Rate Over Time</strong></p><p>This panel tracks the error rate of model calls, visualized as a line chart. Spikes here may indicate upstream issues such as invalid MCP responses, token limits being exceeded, or transient API errors. By correlating these errors with traffic patterns, you can quickly pinpoint reliability issues in production.</p><p>With these dashboards in place, you can move beyond ad-hoc debugging and gain data-driven insights into your MCP agent. Whether it&#8217;s latency hotspots, slow SigNoz queries, token usage spikes, or rising error rates, SigNoz provides the observability foundation you need to ensure your AI-driven observability assistant stays reliable and responsive.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!1FAA!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!1FAA!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 424w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 848w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1272w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp" width="1174" height="716" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:716,&quot;width&quot;:1174,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Error Rate&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Error Rate" title="Error Rate" srcset="https://substackcdn.com/image/fetch/$s_!1FAA!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 424w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 848w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1272w, https://substackcdn.com/image/fetch/$s_!1FAA!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3c1d9127-1902-45a2-b6e4-89e75298d7c2_1174x716.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Total Error Rate Panel</em></figcaption></figure></div><h2><strong>Wrapping it Up</strong></h2><p>LangChain agents integrated with MCP servers open the door to powerful new workflows, but that power comes with more moving parts: LLM calls, tool interactions, server communications, and error handling. Without the right observability, it&#8217;s easy for problems to hide in the noise.</p><p>By pairing OpenTelemetry with SigNoz, you get full visibility into the agent lifecycle: where time is spent, which tools are bottlenecks, and what errors are occurring. Whether it&#8217;s a slow external API, a looping agent, or a rate limit error, you can see exactly what happened and where.</p><p>With this clarity, debugging becomes faster, scaling becomes smoother, and users get more reliable experiences even as your agents grow more complex.</p>]]></content:encoded></item><item><title><![CDATA[Why Observability Isn’t Just for SREs (and How Devs Can Get Started)]]></title><description><![CDATA[This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before.]]></description><link>https://newsletter.signoz.io/p/why-observability-isnt-just-for-sres</link><guid isPermaLink="false">https://newsletter.signoz.io/p/why-observability-isnt-just-for-sres</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 17 Aug 2025 14:15:20 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!qbYV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Almost every other day, when I scroll past r/devops or r/sre, I see a <strong><a href="https://signoz.io/blog/why-observability-isnt-just-for-sres/www.reddit.com/r/sre/comments/1b54tpp/software_engineer_sre_devops/">post like this</a></strong> asking how a dev can get started with devops, observability, etc.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!H4Mg!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 424w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 848w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1272w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp" width="1456" height="649" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:649,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Sample Reddit thread on how to get started with OTel&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Sample Reddit thread on how to get started with OTel" title="Sample Reddit thread on how to get started with OTel" srcset="https://substackcdn.com/image/fetch/$s_!H4Mg!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 424w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 848w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1272w, https://substackcdn.com/image/fetch/$s_!H4Mg!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F148ffbf4-eaa6-48ae-9b57-78df11917bae_1566x698.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Sample Reddit thread on how to get started with OTel. Source: <strong><a href="https://signoz.io/blog/why-observability-isnt-just-for-sres/www.reddit.com/r/sre/comments/1b54tpp/software_engineer_sre_devops/">Reddit</a></strong></em></figcaption></figure></div><p>Sample Reddit thread on how to get started with OTel</p><p>This blog is an attempt for anyone lost to find their way into observability and a wake-up call for devs to they should think about observability more actively today than ever before.</p><p>A dev&#8217;s observability playbook.</p><h2><strong>Why should you, a developer, care?</strong></h2><p>As devs, we often obsess over making our code neater, maintaining systems better, and reducing technical debt. We think of a couple of edge cases and handle them well. We write some tests, debug a bit, drink some hot brew, then call it a day. However, in 2025, I am unsure if this will make the cut.</p><p>Here&#8217;s a short elevator pitch on why you, <em>as a developer,</em> should care about observability today.</p><h3><strong>Product Engineers With Extreme Ownership</strong></h3><p>Gone are the days when a PM would hand you a requirements document and design, and <em>then</em> <strong>you would just code</strong> and <em>then</em> leave the testing to a QA and <em>then</em> whatever happens next to the SREs. The role of a dev has expanded <em>beyond</em> this.</p><p>Here&#8217;s what a day in the life of a product/software engineer looks like today,</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!O7_R!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!O7_R!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 424w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 848w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1272w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp" width="844" height="1194" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/c480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1194,&quot;width&quot;:844,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Day in the life of an engineer at SigNoz&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Day in the life of an engineer at SigNoz" title="Day in the life of an engineer at SigNoz" srcset="https://substackcdn.com/image/fetch/$s_!O7_R!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 424w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 848w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1272w, https://substackcdn.com/image/fetch/$s_!O7_R!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fc480913a-5f48-470e-b156-d59f10ec008f_844x1194.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Day in the life of an engineer at SigNoz. Source: <strong><a href="https://signoz.io/blog/srikanth-signoz/">SigNoz Blog</a></strong></em></figcaption></figure></div><p>You are <em>kind of</em> expected to know everything, at least a little bit. Companies increasingly value <em>product engineers</em> who own the <em>full lifecycle of a feature</em> from design to coding to deployment and monitoring. It means <em>you</em>, the developer, need to know when your application misbehaves in the wild and be ready to fix it, which is exactly what observability enables.</p><h3><strong>Systems are Scaling Faster (and getting more complex)</strong></h3><p>Modern software architectures have exploded in scale and complexity. We&#8217;re building distributed microservices, deploying to clouds and Kubernetes, handling global user traffic and <em>shipping faster</em> than ever before. When hundreds of containers or functions communicate with each other, failures often cascade in unpredictable ways. We thrive on getting a holistic view of these complex systems, which is exactly what observability solves for.</p><h3><strong>Testing every Edge Case isn&#8217;t Feasible</strong></h3><p>Building something as simple as an input box itself can include a multitude of edge cases.</p><ul><li><p>What if the input is too short or too long?</p></li><li><p>What if there&#8217;s a special character in the input?</p></li><li><p>How to handle white spaces?</p></li><li><p>How to handle SQL injection?</p></li></ul><p>These are a few from the top of my mind. However, testing and brainstorming for potential edge cases can become increasingly cumbersome as systems become more complex.</p><p>Observability acts as your safety net, catching issues that slip through testing and helping you understand real-world system behaviour.</p><h3><strong>Users don&#8217;t like bugs, but they HATE slow resolution more</strong></h3><p>As an end-user for a lot of products, I am very impatient when something stops working. So I can <em>imagine</em> what users feel like when their product doesn&#8217;t work as expected. Bugs and outages are never welcome, but what really frustrates users is when issues drag on without a fix.</p><p>We are in a <em>highly fast-paced</em> world, where no one waits for anything, and users have zero tolerance for downtime or latency. <strong>Performance of systems is mission-critical</strong>. Every hour of downtime or a delayed fix can cost a substantial amount.</p><p>Observability is what makes rapid resolution possible; it helps you spot issues immediately and pinpoint the root cause without wasting time, and directly translates to customer retention.</p><h2><strong>Observability: Beyond APM and Infra Monitoring</strong></h2><p>So what&#8217;s the point of <em>observability</em>, anyway?</p><p>Is it just a fancy word for monitoring?</p><p>Not really.</p><p>Traditional Application Performance Monitoring [APM] and infrastructure monitoring are about tracking known metrics [CPU, memory, request latency, etc.] and alerting on predefined thresholds. Observability goes <strong>beyond</strong> that by enabling you to infer the internal state of the system from its outputs.</p><p>It&#8217;s often defined by three pillars of telemetry data &#8212; <strong>logs, metrics, and traces, but there&#8217;s more to it</strong>. Together, they give you a 360&#176; view of what&#8217;s happening inside your applications.</p><ul><li><p><strong>Logs</strong> are the record of events [think of them as your app&#8217;s diary of what it&#8217;s doing].</p></li><li><p><strong>Metrics</strong> are numeric measurements [e.g. memory at 75%, 500 requests/minute] that track trends and health.</p></li><li><p><strong>Traces</strong> follow the path of a single request or transaction through multiple services [useful in microservices to see how a request <em>flows</em> and where it slows down].</p></li></ul><p>Observability tools unify these signals to help you answer <em>new questions</em> about your system&#8217;s behaviour, not just the ones you preset.</p><p>For example, a classic monitor might tell you the <em>error rate exceeded 5%</em> and <em>something&#8217;s wrong,</em> whereas an observability approach lets you dig in and ask <em>why</em> it&#8217;s wrong, which users or inputs caused this? What else was happening on the system at that time?</p><p>It&#8217;s a more exploratory, investigative mindset.</p><p>Crucially, observability isn&#8217;t limited to just application performance like APM is. Today it has expanded to cover the <strong>health of the entire system</strong>, including <strong>infrastructure and third-party services</strong>. APM might catch known issues [say, a slow database query you anticipated], but observability will help surface the <em>weird, unexpected issues</em> that weren&#8217;t explicitly looked for.</p><h2><strong>Hello World, OpenTelemetry.</strong></h2><p>By now, hopefully, it&#8217;s clear why you should care about observing your systems actively. The next <em>obvious</em> question is how you can achieve it as a developer.</p><p>Your observability tooling will often be influenced by what your org has already adopted. Many teams inherit an existing monitoring stack, maybe Prometheus for metrics [along with dashboards powered by PromQL], or a log system that uses LogQL. These tools may already be wired into alerting pipelines, dashboards, and operational runbooks. In such cases, it&#8217;s wise to continue using what&#8217;s already working well.</p><p>The good news is that OpenTelemetry <em>plays nicely</em> with many of these tools, so you can gradually adopt it without disrupting what&#8217;s in place.</p><p>That said, if you <strong>are</strong> starting on a fresh slate, I&#8217;d strongly recommend OpenTelemetry [OTel]. The advantages that OTel brings to the table are plenty. You can read more about the advantages of having a vendor-agnostic and open-source observability framework from OTel&#8217;s <strong><a href="https://opentelemetry.io/docs/what-is-opentelemetry/">official docs</a></strong>.</p><h3><strong>OpenTelemetry in &lt; 200 words</strong></h3><p>At its core, OTel introduces the idea of <em>signals,</em> primarily traces, metrics, and logs, that describe what your application is doing. Developers use the <strong>OTel API</strong> to create and emit these signals, while the <strong>OTel SDK</strong> handles the heavy lifting of batching, processing, and exporting the data to your chosen backend.</p><p>Instrumentation [the process of collecting these signals] can be done automatically or manually. Usually, it&#8217;s a humble <em>mix of both</em>.</p><p>Once instrumented, your application will emit trace spans, metrics, and logs in OTel&#8217;s standard formats. You can configure exporters to send that data to various backends, whether that&#8217;s printing to console during development or an observability vendor. The key point is that OTel decouples instrumentation from the backend. You instrument your code once, then choose where to send the data. This means you get the flexibility to start small with any vendor and switch up as you scale to vendors that are better suited for your needs.</p><p>To understand OTel in more depth, I highly suggest you to give <strong><a href="https://signoz.io/blog/what-is-opentelemetry/">this</a></strong> a read.</p><h3><strong>Copy, Paste &amp; Run Example</strong></h3><p>Let me take you through a small exercise that can quickly show you the power of OpenTelemetry.</p><p>Say you have any application, it could be a side project or a micro-service you are an owner of [just start a new branch &#129335;&#127995;&#8205;&#9792;&#65039;]. Since Python is a highly common language, the next couple of instructions will be for a Python application, but a simple Google search or LLM input would help to tweak this to any language of your choice.</p><ol><li><p>Create and activate a virtual environment</p></li><li><p>Run the following commands, in the given sequence,</p></li></ol><pre><code><code>pip install opentelemetry-distro

pip install flask requests

opentelemetry-bootstrap -a install

opentelemetry-instrument --traces_exporter console --metrics_exporter console --logs_exporter console python server_automatic.py

</code></code></pre><p>You just completed a very basic instrumentation of your Python application, and you should be seeing traces, metrics and logs getting output to your console!</p><p><strong>&#9888;&#65039; Reminder</strong></p><p>Reminder: This is a very basic example of instrumentation. OpenTelemetry has way more potential and power, but this is a starter example to quickly give you a hands-own experience on how instrumentation with OpenTelemetry feels like, and what kind of telemetry data can you collect.</p><h2><strong>Beating the OpenTelemetry Learning Curve</strong></h2><p>Like almost any other skill in life, OpenTelemetry also has quite a learning curve. In fact, there is a whole <strong><a href="https://www.reddit.com/r/devops/comments/nxrbqa/opentelemetry_is_great_but_why_is_it_so_bloody/">Reddit thread</a></strong> titled &#8220;<strong>OpenTelemetry is great, but why is it so bloody complicated?&#8221;.</strong></p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Pk0c!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 424w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 848w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp" width="1456" height="1369" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:1369,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Reddit thread on why OTel can be complicated&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Reddit thread on why OTel can be complicated" title="Reddit thread on why OTel can be complicated" srcset="https://substackcdn.com/image/fetch/$s_!Pk0c!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 424w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 848w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1272w, https://substackcdn.com/image/fetch/$s_!Pk0c!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F04b215ca-f6ad-4a3b-b52b-1d1164358bfa_1514x1424.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Reddit thread on why OTel can be complicated. Source: <strong><a href="https://www.reddit.com/r/devops/comments/nxrbqa/opentelemetry_is_great_but_why_is_it_so_bloody/">Reddit</a></strong></em></figcaption></figure></div><p>Depending on how deeply you want to observe your application, the complexity can vary. For instance, getting started with an example shown above is very easy, but the moment you dive deeper into traces, logs, metrics, spans, etc., it can become overwhelming. I&#8217;d like to introduce you to the Dunning-Kruger curve of confidence vs. competence.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!qbYV!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!qbYV!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 424w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 848w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp" width="1131" height="679" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/d9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:679,&quot;width&quot;:1131,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Dunning-Kruger curve of confidence vs. competence&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Dunning-Kruger curve of confidence vs. competence" title="Dunning-Kruger curve of confidence vs. competence" srcset="https://substackcdn.com/image/fetch/$s_!qbYV!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 424w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 848w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1272w, https://substackcdn.com/image/fetch/$s_!qbYV!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Fd9dfd95a-c6cd-4cf4-829d-3a375a66f777_1131x679.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Dunning-Kruger curve of confidence vs. competence. Source: <strong><a href="https://www.linkedin.com/pulse/have-you-experienced-dunning-kruger-effect-when-hiring-jason-culloo/">Jason Culloo on LinkedIn</a></strong></em></figcaption></figure></div><p>I just wanted to tell you not to be discouraged. This steep ramp-up is common, and with a step-by-step approach, you will get comfortable with the concepts over time.</p><p>I highly suggest following our <strong><a href="https://signoz.io/resource-center/opentelemetry/">Blog series on OpenTelemetry</a></strong>, which has articles on almost every topic related to OpenTelemetry.</p><p></p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to stay updated.</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2><strong>Next steps..</strong></h2><p>So where do you go from here?</p><h3><strong>Review your side projects</strong></h3><p>First, review your current projects or side projects and assess your observability gaps. Are you <em>logging</em> enough information? Do you have <em>metrics</em> for key behaviours? If not, that&#8217;s a great place to begin. Start instrumenting gradually using the tools and tips above. Consider implementing OpenTelemetry in one service and showcasing a trace to your team; it might inspire wider adoption once they see the value.</p><h3><strong>Make Observability a Habit</strong></h3><p>Next, consider making <strong>observability a habit in your development workflow</strong>. For instance, when reviewing code or designing new components, include observability questions in the process [e.g., &#8220;How will we know if this fails in production?&#8221;]. Over time, you and your team will naturally build more observable systems. This <em>proactive</em> approach eventually pays off by reducing nasty surprises and shortening debug sessions when issues do occur. Keep learning and stay updated. The observability landscape is evolving [with improvements in OpenTelemetry, new analysis tools, etc.], and being knowledgeable will set you apart. Follow a couple of observability blogs or community forums [the r/observability subreddit, devops blogs, CNCF talks] to see what challenges others are solving.</p><h3><strong>Embrace the Ownership Mindset</strong></h3><p>Finally, <strong>embrace the mindset</strong>*: as a developer, caring about observability means caring about your software <em>beyond</em> just writing code. It&#8217;s about owning the reliability and performance of what you build. In 2025 and beyond, the ability to quickly understand and fix issues in complex systems is <em>gold</em>. By investing in observability and tools like OpenTelemetry, you&#8217;re essentially future-proofing your career and your projects. So grab that playbook, get your hands dirty with some telemetry, and start turning those <em>unknown unknowns</em> into <em>well-understood knowns</em>. Your users [and your on-call self] will thank you!</p>]]></content:encoded></item><item><title><![CDATA[I built an MCP Server for Observability. This is my Unhyped Take]]></title><description><![CDATA[This is my honest, raw take on why MCP servers for observability may not be worth the hype, if it&#8217;s hyped at all, subject to moot, of course.]]></description><link>https://newsletter.signoz.io/p/i-built-an-mcp-server-for-observability</link><guid isPermaLink="false">https://newsletter.signoz.io/p/i-built-an-mcp-server-for-observability</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sat, 09 Aug 2025 14:39:17 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!ZlQH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Recently, I read a <strong><a href="https://www.honeycomb.io/blog/its-the-end-of-observability-as-we-know-it-and-i-feel-fine">blog</a></strong> titled &#8220;<strong>It&#8217;s The End Of Observability As We Know It (And I Feel Fine)&#8221;,</strong> which discussed MCP servers in observability and how these systems would potentially be the &#8220;end of observability&#8221;.<br>As someone who has spun up an MCP server for an observability backend [SigNoz] and as someone who has been in the space for a while, I certainly do not think so. This is my honest, raw take on why MCP servers for observability may not be worth the hype, if it&#8217;s hyped at all, subject to moot, of course.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!ZlQH!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!ZlQH!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 424w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 848w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!ZlQH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp" width="800" height="864" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:864,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Is it all a marketing stunt after all?&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:true,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Is it all a marketing stunt after all?" title="Is it all a marketing stunt after all?" srcset="https://substackcdn.com/image/fetch/$s_!ZlQH!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 424w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 848w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 1272w, https://substackcdn.com/image/fetch/$s_!ZlQH!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F72e53a95-bf40-4526-b4ef-d378a75434fd_800x864.webp 1456w" sizes="100vw" fetchpriority="high"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Is it all a marketing stunt after all?</em></figcaption></figure></div><p><strong>&#128221; Note</strong></p><p>This is not a rebuttal, but a healthy debate which is vital to the sustenance of intelligence on Earth and critical to evaluating engineering paradigms.</p><h2><strong>Now, what is this MCP thing?</strong></h2><p>Enter Model Context Protocol, commonly referred to as MCP.</p><p>For those who are not familiar, MCP, originally developed by Anthropic, is an open standard that defines how AI agents or LLMs [think Claude etc] can connect to external tools and data sources in a uniform way.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!pRSj!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!pRSj!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 424w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 848w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 1272w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!pRSj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp" width="1456" height="819" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:819,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;MCP is analogous to USB-C&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:false,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="MCP is analogous to USB-C" title="MCP is analogous to USB-C" srcset="https://substackcdn.com/image/fetch/$s_!pRSj!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 424w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 848w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 1272w, https://substackcdn.com/image/fetch/$s_!pRSj!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ad608ab-eeac-4b27-b116-3201e77be751_1920x1080.webp 1456w" sizes="100vw"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>MCP is analogous to USB-C. PC: [Norah Sakal&#8217;s blog](https://norahsakal.com/blog/mcp-vs-api-model-context-protocol-explained/)</em></figcaption></figure></div><p>One of the most powerful aspects of MCP is that you can create an MCP server once and use it with any compatible agent of your choice. It decouples the server [which exposes the tools] from the AI model itself. This makes MCP analogous to USB-C; you build a standard port once, and any device [agent] that supports it can plug in and start using it.</p><p>At its core, I think MCP is not revolutionary in itself but rather evolutionary. It&#8217;s a universal standard, and it makes things easier and keeps the engines running <em>forward</em>.</p><h2><strong>What does it mean to have an MCP server for your observability backend?</strong></h2><p>Suppose, You see a spike in memory usage in your <em>cart_service</em> dashboard. Some of your next steps would be to look around for a memory leak in logs or traces and reach a possible conclusion. With MCP in the frame, you can ask your AI agent why the spike occurred, and then it uses the MCP tools to make API calls and reach a <em>couple of hypotheses.</em></p><p>This is the simplest way I can put across the &#8220;role&#8221; an MCP server plays in an observability stack.</p><p>Here&#8217;s a quick demo of a very basic MCP server I spun up for SigNoz with Cursor as my AI agent.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!bM_E!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!bM_E!,w_424,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 424w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_848,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 848w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_1272,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 1272w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_1456,c_limit,f_webp,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!bM_E!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif" width="800" height="492" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:492,&quot;width&quot;:800,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Demo of a very basic MCP server for SigNoz&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Demo of a very basic MCP server for SigNoz" title="Demo of a very basic MCP server for SigNoz" srcset="https://substackcdn.com/image/fetch/$s_!bM_E!,w_424,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 424w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_848,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 848w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_1272,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 1272w, https://substackcdn.com/image/fetch/$s_!bM_E!,w_1456,c_limit,f_auto,q_auto:good,fl_lossy/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F1ad5f249-4454-4b0e-85d4-8486e9a51c1e_800x492.gif 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Demo of a very basic MCP server for SigNoz</em></figcaption></figure></div><h2><strong>MCP servers for observability may not be the holy grail it is hyped to be</strong></h2><p>Let me get to the crux of this blog. I would like to draw an analogy (or parallelism?) between <strong><a href="https://en.wikipedia.org/wiki/P_versus_NP_problem">P vs NP problems</a></strong> from my automata theory lectures and the role of MCP servers in the observability space. Here&#8217;s my two cents,</p><ul><li><p><strong>(P)</strong> refers to problems that can be solved quickly by a computer.</p></li><li><p><strong>(NP)</strong> refers to problems where, if you are <em>given</em> a potential solution, you can <em>verify</em> it quickly.</p></li></ul><p>Root Cause Analysis comes under a set of problems that are not easy to solve (not P) but once a solution, or say a hypothesis, is generated, it becomes easier to verify it [although there are exceptions]. That is, they can be labelled as an (NP) problem. MCP servers, in my opinion, do a decent job of identifying plausible hypotheses for RCA in most cases, with room for exceptions.</p><p>But <em><strong>sometimes</strong></em>, the <em><strong>process of manually verifying</strong></em> the generated hypotheses is as cumbersome as finding the solution yourself in the first place. Let&#8217;s examine this in greater detail.</p><h3><strong>When does the LLM get it wrong?</strong></h3><p>Usually, if the issue is familiar to the LLM, if it knows of it and there is sufficient context for it in the LLM&#8217;s training data, it&#8217;s insanely good at reaching the correct hypotheses quickly. That is, when the <em>problem space overlaps with the LLM&#8217;s knowledge base</em>. For example, a pod <em>CrashLoopBackOff</em> due to the wrong image pull or memory limit can be easily spotted by an LLM. However, when the issue is novel, the chances of an LLM getting RCA right today are close to zero. Let me back up my argument with facts. Here&#8217;s the citation from a conference paper, &#8220;<strong><a href="https://link.springer.com/chapter/10.1007/978-3-031-97564-6_25?error=cookies_not_supported&amp;code=d42d431d-85f3-4cc6-983c-e539b5ff43f4#:~:text=few,structured%20guidance%20for%20reliable%20RCA">AIOps for Reliability: Evaluating Large Language Models for Automated Root Cause Analysis in Chaos Engineering</a></strong>&#8221;,</p><blockquote><p><em><strong>We simulate eight real-world failure scenarios in a controlled e-commerce environment and assess LLMs&#8217; performance in zero-shot and few-shot settings compared with Site Reliability Engineers. While LLMs can identify common failure patterns, their accuracy is highly dependent on prompt engineering. In zero-shot settings, models achieve moderate accuracy (44&#8211;58%), often misattributing harmless load spikes as security threats. However, few-shot prompting improves performance (60&#8211;74% accuracy), suggesting that LLMs require structured guidance for reliable RCA.</strong></em></p><p><em><strong>Despite their potential, LLMs are not yet ready to replace human SREs, who achieved over 80% accuracy due to hallucinations, misclassification biases, and lack of explainability. The findings highlight that LLMs can be co-pilots in incident response, but human oversight remains essential.</strong></em></p></blockquote><div><hr></div><p><strong>&#9989; Info</strong></p><p>Zero-shot prompting refers to the method of asking an LLM to perform a task without providing any prior examples on how to do it. On the other hand, few-shot prompting is the technique of providing 2-3 examples of performing a task; <em>in-context learning</em>.</p><div><hr></div><p></p><p>So, let me paint a picture: <em><strong>in that intense moment of an escalation</strong></em>, you're working on prompts to make your LLM coupled with MCP to brainstorm hypotheses for you, <em>and then</em> verifying those hypotheses manually.</p><p><em>Tsk tsk. Sounds like a bad idea.</em></p><p>This leads me to my next point, which is also a well-established fact.</p><h3><strong>Hallucinations</strong></h3><p>LLMs hallucinate. <em>A lot</em>. Maybe it&#8217;s <em>slightly schizophrenic</em>?</p><p>A condition that I never thought would ever be associated with a machine, but it seems like LLMs themselves have quite a few mental health issues to cater to.</p><p>When an LLM comes up with the wrong reasoning, it does so with a sense of confidence and conviction that becomes difficult to take note of unless you are a veteran in what you do. Or you are good at catching lies. It is a dangerous property in a high-stakes context like incident management or RCA. An LLM-based RCA might calmly assert a root cause that sounds convincing [&#8220;A cache miss storm likely causes the outage due to XYZ&#8221;], leading the team to pursue that line of inquiry and potentially ignore the real issue until much later.</p><p>Of course, we&#8217;ve progressed to a point where the amount of hallucination is less and precision is more, but &#10024; <em>trust issues</em> &#10024; exist.</p><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!iYNN!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!iYNN!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 424w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 848w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!iYNN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp" width="1146" height="520" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:520,&quot;width&quot;:1146,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Guilty as charged&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Guilty as charged" title="Guilty as charged" srcset="https://substackcdn.com/image/fetch/$s_!iYNN!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 424w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 848w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 1272w, https://substackcdn.com/image/fetch/$s_!iYNN!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F34c76dfd-571b-4826-aef9-7f0ded18b388_1146x520.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Guilty as charged</em></figcaption></figure></div><p>Let me also point out to you how these hallucinations could aggravate in a tool-chaining system like an MCP system or an agentic model.</p><h3><strong>Deviations and Alignments</strong></h3><p>If an AI agent suggests <em>Service X is likely the culprit due to a spike in latency</em>, an engineer must dig into dashboards, logs, or traces to confirm that.</p><p>If the suggestion is wrong [or one of many], you&#8217;ve spent precious time on a false lead. In worst cases, verifying multiple AI-proposed hypotheses in sequence can feel like exploring an exponentially growing search tree of possibilities, much like brute-forcing an NP problem.</p><p>Consider a scenario where an LLM-based agent makes 8 MCP tool calls in a troubleshooting session, and at each step, there are ~3 plausible interpretations of the data [only one of which is correct]. The space of possible reasoning paths through those steps is on the order of 3^7 [over 2,000 paths], and <strong>any deviation at one step leads the AI down a wrong branch</strong>.</p><p>And we are being generous.</p><p>In theory, the agent should pick the correct path, but in practice, each step carries a risk of error. This combinatorial explosion of possibilities means the agent could easily go astray unless it stays perfectly <em><strong>aligned</strong></em> with the ground truth at every step, a very challenging and quite impossible requirement. Without careful constraints or a <strong>human-in-the-loop</strong> to course-correct, the AI might output a confident analysis that is subtly off-track, leaving you to double-check each detail.</p><h3><strong>No Real World Model</strong></h3><p>Let&#8217;s look at this <strong><a href="https://arxiv.org/pdf/2406.03689">recent study from MIT</a></strong>. An LLM was trained with text on the NYC city map.</p><p>The model was perfectly capable of providing <em>near-perfect driving directions in New York City</em>, yet it hadn&#8217;t learned a correct map of NYC. When the researchers introduced a slight change [closing some roads for a hypothetical detour], the model&#8217;s navigation performance <strong>plummeted</strong> because its hidden &#8220;mental map&#8221; was full of errors.</p><p>In other words, the model appeared to <em>know</em> the city when everything was normal, but that was an illusion. It had no <em>coherent internal model</em> of the street grid. This analogy can be easily extended to observability systems where an LLM might seem to <em>know</em> your system&#8217;s behaviour under normal conditions [e.g. it can predict that high CPU on Service Y often coincides with a cache miss in Service Z, because it&#8217;s seen that pattern]. But if something changes, say a new deployment alters dependencies, or an outage causes a novel cascade the LLM can fail in unpredictable ways.</p><h2><strong>Do we have a conclusion?</strong></h2><p>We are living in the <em>AI era,</em> and MCP servers play a significant role in providing an additional interface between us and observability platforms. For tasks like converting natural language queries to PromQL or LogQL, an MCP-powered LLM can perform exceptionally well, as the task has <em>tight semantics and predictable output format</em> along with ample training data to provide it context.</p><p>We might witness a future[or present?], where we spend less time building and staring at graphs and more time prompting LLMs to brainstorm viable hypotheses. We will see the evolution of an agentic layer emerge and become increasingly capable in various aspects of observability, but still requiring a good amount of manual intervention and verification.</p><p>Ultimately, MCP-powered agents are <em>not</em> bringing us closer to automated problem-solving. They are giving us sophisticated <strong>hypothesis generators</strong>. They excel at exploring the known, but the unknown remains the domain of the <em><strong>human engineer</strong></em>. We're not building an automated SRE; we're building a <em><strong>co-pilot that can brainstorm, but can't yet reason</strong></em>. And recognising that distinction is the key to using these tools effectively without falling for the hype.</p><p>And with this, I <em>rest</em> my case.</p>]]></content:encoded></item><item><title><![CDATA[SigNoz Observability Roundup!]]></title><description><![CDATA[Catch up on what we&#8217;ve written and thought about this week.]]></description><link>https://newsletter.signoz.io/p/signoz-observability-roundup</link><guid isPermaLink="false">https://newsletter.signoz.io/p/signoz-observability-roundup</guid><dc:creator><![CDATA[SigNoz]]></dc:creator><pubDate>Sun, 27 Jul 2025 14:02:21 GMT</pubDate><content:encoded><![CDATA[<h4><strong>                                                        </strong></h4><p><strong>                                                    &#128105;&#8205;&#128187; Engineering &amp; Research</strong></p><h3>1. From Sequential Bottlenecks to Concurrent Performance: Optimizing Log Processing at Scale</h3><p>This post describes how the SigNoz team identified a sequential processing bottleneck in their log ingestion pipeline and shifted to concurrent processing. By processing log entries in parallel rather than one at a time, they achieved about <strong>30 % higher throughput</strong> and better CPU/memory utilization. The article explains the scaling challenges encountered when customers send millions of logs per minute, the architectural changes needed to introduce a worker&#8209;pool approach, and how concurrent processing improves performance and reduces consumer lag.</p><p><strong>Read more:</strong> <a href="https://signoz.io/blog/optimizing-log-processing-at-scale/">https://signoz.io/blog/optimizing-log-processing-at-scale/</a></p><div><hr></div><p> <strong>                                                                  &#127873; Miscellaneous </strong></p><h3>2. Cloud or Self&#8209;Hosted &#8211; Which Deployment Model is Right for You? </h3><p>Choosing the right observability platform is only part of the decision; how you deploy it matters just as much. This guide compares SigNoz&#8217;s deployment options: <br>- <strong>SigNoz Cloud</strong>, <br>- <strong>Enterprise Self&#8209;Hosted</strong>, <br>- <strong>Community Edition</strong>, <br>and <strong>Bring Your Own Cloud (BYOC)</strong> </p><p>and outlines when each model makes sense. The post highlights the simplicity and scalability of the fully managed cloud service, contrasts it with self&#8209;hosted models that provide greater control for organizations with strict data residency or compliance requirements, and offers a simple decision framework. It emphasizes that there&#8217;s no one&#8209;size&#8209;fits&#8209;all approach; the right choice depends on your team&#8217;s needs and constraints.</p><p><strong>Read more:</strong> <a href="https://signoz.io/blog/cloud-vs-self-hosted-deployment-guide/">https://signoz.io/blog/cloud-vs-self-hosted-deployment-guide/</a></p><div><hr></div><p><strong>                                                  &#129504; Deep Dives &amp; Analysis</strong></p><h3>3. I Built an MCP Server for Observability &#8211; This Is My Unhyped Take.</h3><p>In this opinion piece, the author reacts to a blog claiming that Model Context Protocol (MCP) servers could replace traditional observability. She explains that an MCP server acts as a universal interface that allows AI agents to interact with tools like SigNoz, likening it to USB&#8209;C in its plug&#8209;and&#8209;play nature. However, she argues that MCP is <strong>evolutionary</strong> rather than revolutionary; while it helps AI agents formulate hypotheses, it doesn&#8217;t eliminate the need for human operators. Large language models can assist with root cause analysis, but they still hallucinate and require structured prompts, so human oversight remains essential.</p><p><strong>Read more:</strong> <a href="https://signoz.io/blog/unhyped-take-on-mcp-servers/">https://signoz.io/blog/unhyped-take-on-mcp-servers/</a></p><div><hr></div><p><strong>                                                </strong>&#128218; <strong>Guides &amp; Tutorials</strong></p><h3>4. Kubernetes Observability with OpenTelemetry &#8211; A Complete Setup Guide (July 16 2025)</h3><p>This extensive guide walks through setting up observability for a Kubernetes cluster using <strong>OpenTelemetry</strong>. It begins by explaining that Kubernetes emits telemetry from container metrics, traces, cluster events and logs, and that OTel offers a vendor&#8209;neutral way to collect and export this data. The tutorial then demonstrates deploying a demo application on <strong>Minikube</strong> and configuring the OpenTelemetry Collector in two modes: <strong>DaemonSet</strong> and <strong>Deployment</strong> using Helm charts. This two&#8209;pronged setup captures both service&#8209;level metrics/traces and cluster&#8209;level metrics/events, providing full visibility into a Kubernetes environment. The aim is to give readers a practical blueprint for instrumenting their own clusters with OTel.</p><p><strong>Read more:</strong> <a href="https://signoz.io/blog/kubernetes-observability-with-opentelemetry/">https://signoz.io/blog/kubernetes-observability-with-opentelemetry/</a></p><div><hr></div><p><strong>Thanks for reading!</strong><br>If you enjoy our updates and want to stay on top of the latest in observability, subscribe to us!</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Subscribe to us today and spread the word!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p>Check out our open-source project on <strong><a href="https://github.com/SigNoz/signoz">GitHub</a></strong> and explore more resources on our <strong><a href="https://signoz.io/">website</a></strong>.</p><p>See you next week with more updates, stay tuned!</p>]]></content:encoded></item><item><title><![CDATA[MCP Observability with OpenTelemetry]]></title><description><![CDATA[We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between? And when something breaks, how do we trace the failure and debug it effectively?]]></description><link>https://newsletter.signoz.io/p/mcp-observability-with-opentelemetry</link><guid isPermaLink="false">https://newsletter.signoz.io/p/mcp-observability-with-opentelemetry</guid><dc:creator><![CDATA[Elizabeth]]></dc:creator><pubDate>Thu, 17 Jul 2025 14:30:42 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!0dB8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>2025 has truly been the year of Agentic AI, with MCP (Model Context Protocol) emerging as one of its flashy and most talked-about innovations. While many products have seamlessly integrated MCP servers into their systems, these servers are increasingly being labelled as <em>black boxes,</em> opaque components that handle critical tasks but offer little visibility into what's happening under the hood.</p><p>We prompt an agent, a tool gets invoked, and a response is generated. But what really happens in between? And when something breaks, how do we trace the failure and debug it effectively?</p><p>In this blog, we'll explore why observability is crucial for MCP server-client systems, what kind of telemetry you can capture, and how to instrument your stack to bring these black boxes into the light.</p><h2><strong>Why is Observability important in an MCP server-client system?</strong></h2><p>MCP-based architectures enable AI agents to dynamically invoke external tool servers as part of their reasoning loop. While this flexibility is powerful, it introduces significant complexity in tracking the flow of data, monitoring system performance, and diagnosing failures especially in distributed environments.</p><p><strong>In smaller systems</strong>, with a single agent and a small number of tool servers, developers might rely on basic logging or manual inspection. However, even in these setups:</p><ul><li><p>Latency spikes in tool responses (e.g. an external API slowing down) can degrade agent performance.</p></li><li><p>Silent failures can occur where a tool invocation does not return valid data, but no clear error is raised.</p></li><li><p>Debugging why an agent selected a particular tool or why a tool produced unexpected output is time-consuming without structured telemetry.</p></li></ul><p><strong>In production-grade systems</strong>, where MCP agents orchestrate calls to multiple tools [sometimes chaining or parallelising requests across services], observability becomes critical:</p><ul><li><p><strong>End-to-end visibility</strong> ensures that every tool invocation, downstream call, and response can be traced through the system, from agent prompt to final output.</p></li><li><p><strong>Cross-service traceability</strong> is vital for isolating failures across service boundaries for example, determining whether a tool server's failure stemmed from its own logic, an upstream agent issue, or a downstream API it depends on.</p></li><li><p><strong>Performance metrics</strong> provide quantifiable indicators like request throughput, error rates, and p95/p99 latencies for tool calls essential for meeting SLAs and diagnosing performance regressions.</p></li><li><p><strong>Capacity and scaling insights</strong> help identify hot spots [e.g. one tool server handling disproportionate load] and guide resource allocation or autoscaling strategies.</p></li></ul><p>Consider a scenario where an agent calls a tool server that queries an external API expected to return 1000 records. Without observability:</p><ul><li><p>You can't easily see whether latency arose in the agent logic, the tool server, or the external API.</p></li><li><p>Partial failures [e.g. API returned only 5xx records, or timed out] might go unnoticed until they cause downstream errors.</p></li><li><p>You have no clear view of whether this issue is isolated or affecting all similar requests.</p></li></ul><p>In short, observability transforms MCP systems from opaque black boxes into measurable, debuggable, and optimisable components. Without it, diagnosing issues in distributed, agentic pipelines becomes a nightmare.</p><div class="subscription-widget-wrap-editor" data-attrs="{&quot;url&quot;:&quot;https://newsletter.signoz.io/subscribe?&quot;,&quot;text&quot;:&quot;Subscribe&quot;,&quot;language&quot;:&quot;en&quot;}" data-component-name="SubscribeWidgetToDOM"><div class="subscription-widget show-subscribe"><div class="preamble"><p class="cta-caption">Thanks for reading Observability Real Talk! Subscribe for free to receive new posts and stay updated!</p></div><form class="subscription-widget-subscribe"><input type="email" class="email-input" name="email" placeholder="Type your email&#8230;" tabindex="-1"><input type="submit" class="button primary" value="Subscribe"><div class="fake-input-wrapper"><div class="fake-input"></div><div class="fake-button"></div></div></form></div></div><p></p><h2><strong>Why OpenTelemetry?</strong></h2><p>MCP systems are designed for openness and interoperability; they allow AI agents to invoke tools across diverse servers, languages, and environments. OpenTelemetry is a natural fit for observing these systems because it shares the same design philosophy.</p><p><em>vendor-neutral. standard-based. and language-agnostic.</em></p><p>Let's look at some more solid points,</p><h3><strong>Context Propagation</strong></h3><p>MCP server-client communication spans multiple services and often crosses network boundaries. OTel excels at context propagation, using open standards like the <strong><a href="https://www.w3.org/TR/trace-context/">W3C Trace Context</a></strong> to link requests across services. For example, when an agent initiates a tool call, OTel automatically injects trace headers into that request. The tool server picks up that context and continues the trace, allowing you to visualize the <em>entire request journey</em> from agent prompt through tool execution to downstream API calls. This is essential for root cause analysis and performance debugging in distributed MCP pipelines, where visibility across boundaries is critical.</p><h3><strong>Multi-Language Support</strong></h3><p>MCP architectures are rarely single-language. You might have an agent written in Python orchestrating tools served by Node.js, or vice versa. OpenTelemetry provides mature SDKs for all major languages including JavaScript/TypeScript, Python, Go, and Java all implementing the same specification. This means:</p><ul><li><p>You can instrument every MCP component in its native language without gaps in visibility.</p></li><li><p>Spans started in one language [e.g. a Python agent] connect seamlessly to spans in another [e.g. a TypeScript tool server].</p></li><li><p>Traces form a coherent, end-to-end view across the polyglot system.</p></li></ul><p>This multi-language, spec-compliant design aligns perfectly with MCP's goal of decoupled, language-agnostic tool integration.</p><h3><strong>Open Standards First</strong></h3><p>Just as MCP emphasizes <em>open, standard-based communication</em> between agents and tools, OpenTelemetry emphasizes <em>open, standard-based telemetry</em>. OTel emits telemetry data in the <strong>OpenTelemetry Protocol (OTLP)</strong>, a vendor-neutral, open format supported by a wide range of backends. You can start by exporting data locally for debugging, and later switch to a full observability backend without changing your instrumentation code. This flexibility means your telemetry, like your MCP tool calls, stays free of vendor lock-in.</p><p>In short, OpenTelemetry stays true to the principles that MCP is built on; openness, interoperability, and language-agnostic design making it the ideal observability framework for MCP server-client systems.</p><p>Let's look at what all telemetry data can be collected when instrumenting your MCP server-tool-client systems with Otel.</p><h2><strong>What can be observed with OTel in an MCP server-client system?</strong></h2><p>Instrumenting your MCP server-client architecture with OpenTelemetry enables collection of precise telemetry data that helps you monitor performance, debug issues, and guide scaling decisions. Here's what you can concretely collect:</p><h3><strong>Performance Metrics</strong></h3><p>OpenTelemetry's metrics API lets you record numerical measurements that quantify system behavior. In the context of an MCP server, we can record the following,</p><ul><li><p><strong>Tool invocation duration</strong></p><p>Example metric: <code>tool_invocation_duration_ms</code> [histogram]</p><p>Tracks the latency of each tool execution, allowing you to compute p50/p95/p99 latencies.</p><p><em>e.g.</em>: <code>fetch_tool p95 = 480ms</code>, <code>database_query_tool p99 = 1100ms</code></p></li><li><p><strong>Tool invocation count</strong></p><p>Example metric: <code>tool_invocation_total</code> [counter]</p><p>Tracks how often each tool is called &#8212; useful for identifying hotspots.</p><p><em>e.g.</em>: <code>fetch_tool = 12000 calls/min</code>, <code>summarize_tool = 3000 calls/min</code></p></li><li><p><strong>Error rates per tool</strong></p><p>Example metric: <code>tool_invocation_errors_total</code> [counter with error_type label]</p><p>Counts failed invocations and categorizes errors.</p><p><em>e.g.</em>: <code>fetch_tool timeout errors = 50/min</code>, <code>db_tool connection errors = 5/min</code></p></li><li><p><strong>Total tokens processed</strong></p><p>Example custom metric: <code>tool_token_usage_total</code> [counter]</p><p>Records the number of input/output tokens handled by the tool, useful for cost tracking and optimization.</p><p><em>e.g.</em>: <code>fetch_tool output_tokens = 50M/day</code></p></li><li><p><strong>System resource metrics</strong> [collected via OTel/ system receivers]</p><ul><li><p><code>cpu_utilization_percent</code>, <code>memory_usage_bytes</code></p></li><li><p>Useful for capacity planning and detecting resource exhaustion.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!0dB8!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!0dB8!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 424w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 848w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 1272w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!0dB8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp" width="1456" height="789" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:789,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Set of metrics collected by OTel in an MCP system visualised in SigNoz&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Set of metrics collected by OTel in an MCP system visualised in SigNoz" title="Set of metrics collected by OTel in an MCP system visualised in SigNoz" srcset="https://substackcdn.com/image/fetch/$s_!0dB8!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 424w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 848w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 1272w, https://substackcdn.com/image/fetch/$s_!0dB8!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4f62b048-2162-4034-9cc2-63471f0daff3_2860x1550.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Set of metrics collected by OTel in an MCP system visualised in SigNoz</em></figcaption></figure></div><h3><strong>Distributed Tracing</strong></h3><p>Every tool invocation and its internal steps can be captured as spans, showing how long each part of the process takes and where failures occur. Example trace structure is shown below,</p><pre><code><code>
[TraceID: abc123]
&#9492;&#9472;&#9472;&gt; Span: Agent prompt handling (Claude Agent)
    &#9492;&#9472;&#9472;&gt; Span: FetchTool invocation (MCP server)
        &#9492;&#9472;&#9472;&gt; Span: External API GET /data (3rd party API)

</code></code></pre><p>Each span can include:</p><ul><li><p><strong>Attributes</strong>:</p><ul><li><p><code>tool.name: fetch_tool</code></p></li><li><p><code>tool.input_size: 1000</code> [records, tokens, etc.]</p></li><li><p><code>tool.output_size: 980</code></p></li><li><p><code>http.status_code: 200</code></p></li><li><p><code>error.type: timeout</code></p></li></ul></li><li><p><strong>Events</strong>:</p><ul><li><p><code>exception</code> with stack trace if a tool fails internally.</p></li><li><p><code>external_api_retry</code> event if the tool attempts to recover from downstream failure.</p></li></ul></li></ul><div class="captioned-image-container"><figure><a class="image-link image2 is-viewable-img" target="_blank" href="https://substackcdn.com/image/fetch/$s_!9hSi!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!9hSi!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 424w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 848w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 1272w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!9hSi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp" width="1456" height="786" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/f66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:786,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:null,&quot;alt&quot;:&quot;Distributed tracing for an MCP server-tool-client system as visualised by SigNoz&quot;,&quot;title&quot;:null,&quot;type&quot;:null,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:null,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="Distributed tracing for an MCP server-tool-client system as visualised by SigNoz" title="Distributed tracing for an MCP server-tool-client system as visualised by SigNoz" srcset="https://substackcdn.com/image/fetch/$s_!9hSi!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 424w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 848w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 1272w, https://substackcdn.com/image/fetch/$s_!9hSi!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2Ff66b3ec7-84b1-4cc4-9edc-5769f657b9fd_2838x1532.webp 1456w" sizes="100vw" loading="lazy"></picture><div class="image-link-expand"><div class="pencraft pc-display-flex pc-gap-8 pc-reset"><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container restack-image"><svg role="img" width="20" height="20" viewBox="0 0 20 20" fill="none" stroke-width="1.5" stroke="var(--color-fg-primary)" stroke-linecap="round" stroke-linejoin="round" xmlns="http://www.w3.org/2000/svg"><g><title></title><path d="M2.53001 7.81595C3.49179 4.73911 6.43281 2.5 9.91173 2.5C13.1684 2.5 15.9537 4.46214 17.0852 7.23684L17.6179 8.67647M17.6179 8.67647L18.5002 4.26471M17.6179 8.67647L13.6473 6.91176M17.4995 12.1841C16.5378 15.2609 13.5967 17.5 10.1178 17.5C6.86118 17.5 4.07589 15.5379 2.94432 12.7632L2.41165 11.3235M2.41165 11.3235L1.5293 15.7353M2.41165 11.3235L6.38224 13.0882"></path></g></svg></button><button tabindex="0" type="button" class="pencraft pc-reset pencraft icon-container view-image"><svg xmlns="http://www.w3.org/2000/svg" width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round" class="lucide lucide-maximize2 lucide-maximize-2"><polyline points="15 3 21 3 21 9"></polyline><polyline points="9 21 3 21 3 15"></polyline><line x1="21" x2="14" y1="3" y2="10"></line><line x1="3" x2="10" y1="21" y2="14"></line></svg></button></div></div></div></a><figcaption class="image-caption"><em>Distributed tracing for an MCP server-tool-client system as visualised by SigNoz</em></figcaption></figure></div><p>The next section explains how we can instrument our systems to collect the above mentioned telemetry data.</p><h2><strong>MCP + Otel, But How?</strong></h2><p>Implementing OpenTelemetry for MCP server-client systems is straightforward. The process is similar to instrumenting any modern distributed service. You initialize the SDK, configure exporters, and apply automatic or manual instrumentation depending on your needs.</p><p>For a detailed step-by-step guide on instrumenting your systems with OpenTelemetry, refer to <strong><a href="https://signoz.io/docs/instrumentation/">this comprehensive guide</a></strong>.</p><p>While OpenTelemetry's auto-instrumentation can give you broad coverage with minimal setup, we recommend manual instrumentation for MCP pipelines where fine-grained visibility is critical.</p><p>Check out this video tutorial where we walk through how to manually instrument an MCP system for deep observability using OTel.</p><div id="youtube2-Y0sTVeIra2E" class="youtube-wrap" data-attrs="{&quot;videoId&quot;:&quot;Y0sTVeIra2E&quot;,&quot;startTime&quot;:null,&quot;endTime&quot;:null}" data-component-name="Youtube2ToDOM"><div class="youtube-inner"><iframe src="https://www.youtube-nocookie.com/embed/Y0sTVeIra2E?rel=0&amp;autoplay=0&amp;showinfo=0&amp;enablejsapi=0" frameborder="0" loading="lazy" gesture="media" allow="autoplay; fullscreen" allowautoplay="true" allowfullscreen="true" width="728" height="409"></iframe></div></div><h2><strong>Conclusion</strong></h2><p>Instrumenting your MCP systems with <strong>OpenTelemetry</strong> allows you to embrace open standards and build observability into your stack without being tied to any proprietary solution. By pairing OpenTelemetry with a one-stop observability platform like SigNoz, you get complete ownership of your telemetry data, full transparency into your system's behavior, and the flexibility to adapt your observability pipeline as your architecture evolves!</p><p>Now that's the cherry on top of your agentic AI pipeline &#127826;.</p>]]></content:encoded></item></channel></rss>