Next-Generation Technologies & Secure Development
,
Observability
Practical Guide to Collector-First Architecture and Phased OTel Migration

OpenTelemetry, or OTel, has undeniably become the standard for observability. Adoption across the industry is accelerating, but execution remains a challenge. Many teams struggle to move from the “why” to the “how,” getting stuck on architecture, migration paths and identifying the right use cases.
See Also: Free Your IT Program of Tech Debt With an Enterprise Browser (eBook)
OTel is not a tool swap. Successful adoption requires treating OTel not only as a technology but as a new operating model. What follows is a strategic blueprint for OTel adoption, focused on the practical realities of implementation and architecture rather than the buzzwords.
OTel’s Real Value: Context, Not Just Vendor Neutrality
When executives sign off on OTel, they often do it for two reasons: vendor agnosticity and efficiency. The goal is to collect telemetry once and avoid vendor lock-in, where proprietary agents speak languages that are not comprehensible by other agents.
While avoiding lock-in is valuable, it’s not the primary driver for adoption. OTel’s true power lies in reducing investigation time by preserving the context layer.
Consider a system producing CPU metrics and another producing logs, but each refers to the same hostname differently, making normalization a problem. Correlating that data becomes difficult. This is where OTel’s semantic conventions become critical. They establish shared naming standards so telemetry describes systems consistently. By using these conventions, particularly resource attributes, teams can identify the exact entities emitting telemetry. The result is built-in correlation: Teams can query a single entity and view every signal associated with it without manual translation.
The “Collector-First” Architecture
A common pitfall is treating OTel adoption as a “big bang” replacement for existing monitoring tools. Instead, a collector-first strategy is recommended.
The OpenTelemetry Collector is central to your architecture. It receives telemetry, processes it through filtering, tagging and enrichment, and exports it to multiple backends. By deploying collectors in the infrastructure before fully re-instrumenting applications, teams decouple data generation from data destination.
Edge vs. Gateway Debate
When designing your collector deployment, draw a clear distinction between the Edge and the Gateway.
- The Edge Collector: This sits close to your application typically as a sidecar or DaemonSet. Avoid heavy transformations or tail-based sampling – keep it lightweight. If an edge collector becomes overloaded, the resulting back pressure can affect user experience.
- The Gateway Collector: This is the centralized processing layer, where you scale ingestion, handle heavy sampling and absorb burst traffic.
Another important consideration is vendor distributions. Many teams prefer upstream “vanilla” OTel to ensure neutrality. But upstream OTel Contrib repositories include a wide range of components that are not always production-hardened. Vendor distributions can provide tested builds, support and faster fixes. Elastic’s recent survey of over 500 observability leaders showed movement away from vanilla OTel and custom builds toward vendor-sourced distributions.
The litmus test for vendor neutrality isn’t whether you use a distro, but whether you are pushing OpenTelemetry Protocol, or OTLP, from the edge. As long as your edge communicates via OTLP, companies remain vendor-agnostic. If a vendor requires a proprietary exporter at the edge, lock-in is re-introduced. The gateway, however, can still be swapped out with vendor-specific distributions.
The Strangler Pattern for Migration
How to move a massive IT environment to OTel without disrupting operations? The Strangler Pattern is recommended.
Do not immediately rip and replace deeply integrated legacy systems. Instead, migrate gradually by starting with low-hanging fruit. Kubernetes is the natural starting point. Because OTel shares DNA with the CNCF ecosystem, it fits naturally in Kubernetes environments.
Start by deploying the OTel Operator and using Helm charts to launch collectors. Receivers can pull Prometheus metrics from Kubernetes endpoints, normalize those using OTel semantic conventions and send them down the telemetry pipeline. Once the infrastructure layer reports reliably, you can move inward toward application instrumentation.
Application Instrumentation: The Duality of Auto and Manual
A common misconception is that teams must choose between automatic instrumentation using agents and manual instrumentation using SDKs. Mature organizations use both.
- Auto-instrumentation: This gets you 70% to 80% of the way there. By adding annotations to Kubernetes pods, instrumentation can capture standard HTTP requests, database calls and response times.
- Manual instrumentation: This bridges the gap between “system uptime” and “business health.” Teams should manually instrument code to capture business-specific attributes such as Customer IDs.
Imagine a slow transaction. Auto-instrumentation shows the database query lagged. Manual instrumentation lets teams search traces for a specific “Customer ID” to see if a VIP client was affected. These attributes can also be added to logs, making the entire dataset searchable by business context.
Operationalizing OTel
Finally, successful adoption requires an operating model. You need ownership of schemas, pipelines and cost controls. You cannot assume that “more telemetry equals more insights.”
Validation is essential. Utilizing tools such as Weaver is recommended to maintain schema consistency and Instrumentation Score to validate the quality of your telemetry. These projects help ensure that as adoption scales, data remains usable and aligned with the agreed semantic conventions.
The Verdict: OpenTelemetry, the De Facto Standard
You can re-instrument your applications every time you change vendors or adopt OpenTelemetry to future-proof your observability architecture. OpenTelemetry adoption is not just about installing a collector. It requires a strategy that prioritizes context, consistency and gradual migration to open-source observability standards.
For more expertise from Elastic on OpenTelemetry, watch the webinar: Getting started with OpenTelemetry: Planning and tips for observability teams.
