Discussion about this post

User's avatar
Juraci Paixão Kröhling's avatar

This topic never gets old and deserves to be shared every now and then!

However! On the per signal strategy, which is the pattern #7 in the canonical reference, the "/metrics" refers to the metrics that are exposed by a Prometheus client. I don't think anybody scrapes /logs or /traces out of their applications. If you have all signals in OTLP format, then getting them out as fast as possible to a single external collector is preferable, having the split happen one layer later. It's a lot of work to reconfigure all your pods if you need them to point to a different address on a per signal basis.

Here's the repo I created some years ago with the OpenTelemetry Collector patterns:

https://github.com/jpkrohling/opentelemetry-collector-deployment-patterns

Expand full comment
Neural Foundry's avatar

Outstanding breakdown of OTel deployment patterns. The multi-cluster control plane approach really resonates with what we're seeing in production environments handling cross-region telemetry. What stands out is how you frame the security benefit of centralizing credential management at the regional gateway layer rather than distributing secrets across every cluster. That's often overlooked in deployment guides but becomes critical at scale. The per-signal pattern also addresses a real pain point: when your log volume explodes but trace throughput stays stable, having independent scaling prevents log processing from starving trace pipelines. One nuance worth adding is that the load-balanced pattern's stickiness requirement for stateful processors like spanmetrics can actualy introduce subtle correctness issues if your LB health checks aren't properly tuned, since a flapping collector can fragment spans across multiple instances before failover completes.

Expand full comment

No posts

Ready for more?