The Collector is a standalone binary that sits between your applications and your observability backends, receiving, transforming, and routing telemetry.
Why use a Collector
You could export telemetry directly from every application to a backend. In practice, almost nobody does this at scale, because the Collector solves several problems at once:
- Decoupling. Applications only ever need to know one endpoint — the local/nearby Collector — regardless of how many backends you actually send to.
- Fan-out. Send the same telemetry to multiple destinations (e.g. Tempo for traces and a SaaS vendor for alerting) without touching application code.
- Offloading work. Batching, retrying, compressing, and enriching telemetry costs CPU — better spent in a dedicated process than competing with your application for resources.
- Central policy. Apply sampling, PII scrubbing, or attribute filtering in one place instead of in every service's SDK config.
- Protocol translation. Receive OTLP and export in a legacy or vendor-specific format, or vice versa.
Receivers, processors, exporters
A Collector pipeline is built from three kinds of components, always executed in this order:
| Component | Role | Examples |
|---|---|---|
| Receiver | Gets data into the Collector. | otlp, prometheus, jaeger, filelog |
| Processor | Transforms data in-flight. | batch, memory_limiter, attributes, tail_sampling |
| Exporter | Sends data out. | otlp, otlphttp, prometheusremotewrite, debug |
Receivers, processors, and exporters are wired together per signal (traces, metrics, logs) into named pipelines.
A minimal configuration
receivers:
otlp:
protocols:
grpc:
endpoint: 0.0.0.0:4317
http:
endpoint: 0.0.0.0:4318
processors:
batch: {}
memory_limiter:
check_interval: 1s
limit_mib: 512
exporters:
otlphttp:
endpoint: https://tempo.example.com:4318
debug:
verbosity: detailed
service:
pipelines:
traces:
receivers: [otlp]
processors: [memory_limiter, batch]
exporters: [otlphttp, debug]
Order matters within a pipeline's processor list — memory_limiter is conventionally placed first to shed load before other processors do more expensive work, and batch is typically placed last, right before export.
Common processors
- batch — groups telemetry before export, dramatically reducing the number of network calls to your backend.
- memory_limiter — protects the Collector process from OOM by refusing new data (or forcing GC) when memory usage crosses a threshold — essential in production.
- attributes / resource — add, rename, or delete attributes; commonly used to strip PII or enrich data with deployment metadata.
- tail_sampling — makes sampling decisions after seeing a trace's full outcome (e.g. keep 100% of traces containing an error) — see the sampling guide.
- filter — drops telemetry matching a condition, e.g. health-check spans that add noise without value.
Deployment patterns
| Pattern | Description | When to use |
|---|---|---|
| Agent (sidecar/daemonset) | One Collector per host or pod, close to the application. | Low-latency local buffering, host metrics collection, Kubernetes DaemonSet |
| Gateway | A centralized Collector tier (often behind a load balancer) that all agents forward to. | Centralized policy enforcement (sampling, PII scrubbing), fan-out to multiple backends |
Many production setups use both: agents on every node forward to a smaller number of gateway Collectors, which apply org-wide policy before exporting to backends. This two-tier design keeps per-host overhead low while centralizing expensive operations like tail-based sampling, which needs to see all spans of a trace to make a decision.
Tail-based sampling decisions require every span of a trace to arrive at the same Collector instance. If your agents load-balance traces across multiple gateway replicas naively, spans of the same trace can land on different instances and the sampling decision breaks. Use a routing/load-balancing exporter keyed by trace ID to keep a trace together.
Collector distributions
The Collector ships as several distributions with different sets of built-in components:
- otelcol-core — minimal, just the core pipeline framework plus OTLP receiver/exporter.
- otelcol-contrib — the "batteries included" build with hundreds of community receivers, processors, and exporters (Prometheus, Kafka, cloud vendor integrations, and more). Most teams start here.
- Custom builds via OCB (OpenTelemetry Collector Builder) — compile your own distribution containing exactly the components you need, keeping the binary small and the attack surface minimal.
The Collector turns "every app exports directly to a vendor" into "every app exports to one local endpoint" — centralizing policy, batching, and vendor routing outside your application code.