The Collector is a standalone binary that sits between your applications and your observability backends, receiving, transforming, and routing telemetry.

Why use a Collector

You could export telemetry directly from every application to a backend. In practice, almost nobody does this at scale, because the Collector solves several problems at once:

  • Decoupling. Applications only ever need to know one endpoint — the local/nearby Collector — regardless of how many backends you actually send to.
  • Fan-out. Send the same telemetry to multiple destinations (e.g. Tempo for traces and a SaaS vendor for alerting) without touching application code.
  • Offloading work. Batching, retrying, compressing, and enriching telemetry costs CPU — better spent in a dedicated process than competing with your application for resources.
  • Central policy. Apply sampling, PII scrubbing, or attribute filtering in one place instead of in every service's SDK config.
  • Protocol translation. Receive OTLP and export in a legacy or vendor-specific format, or vice versa.

Receivers, processors, exporters

A Collector pipeline is built from three kinds of components, always executed in this order:

ComponentRoleExamples
ReceiverGets data into the Collector.otlp, prometheus, jaeger, filelog
ProcessorTransforms data in-flight.batch, memory_limiter, attributes, tail_sampling
ExporterSends data out.otlp, otlphttp, prometheusremotewrite, debug

Receivers, processors, and exporters are wired together per signal (traces, metrics, logs) into named pipelines.

A minimal configuration

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch: {}
  memory_limiter:
    check_interval: 1s
    limit_mib: 512

exporters:
  otlphttp:
    endpoint: https://tempo.example.com:4318
  debug:
    verbosity: detailed

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [otlphttp, debug]

Order matters within a pipeline's processor list — memory_limiter is conventionally placed first to shed load before other processors do more expensive work, and batch is typically placed last, right before export.

Common processors

  • batch — groups telemetry before export, dramatically reducing the number of network calls to your backend.
  • memory_limiter — protects the Collector process from OOM by refusing new data (or forcing GC) when memory usage crosses a threshold — essential in production.
  • attributes / resource — add, rename, or delete attributes; commonly used to strip PII or enrich data with deployment metadata.
  • tail_sampling — makes sampling decisions after seeing a trace's full outcome (e.g. keep 100% of traces containing an error) — see the sampling guide.
  • filter — drops telemetry matching a condition, e.g. health-check spans that add noise without value.

Deployment patterns

PatternDescriptionWhen to use
Agent (sidecar/daemonset)One Collector per host or pod, close to the application.Low-latency local buffering, host metrics collection, Kubernetes DaemonSet
GatewayA centralized Collector tier (often behind a load balancer) that all agents forward to.Centralized policy enforcement (sampling, PII scrubbing), fan-out to multiple backends

Many production setups use both: agents on every node forward to a smaller number of gateway Collectors, which apply org-wide policy before exporting to backends. This two-tier design keeps per-host overhead low while centralizing expensive operations like tail-based sampling, which needs to see all spans of a trace to make a decision.

⚠️
Tail sampling needs a gateway

Tail-based sampling decisions require every span of a trace to arrive at the same Collector instance. If your agents load-balance traces across multiple gateway replicas naively, spans of the same trace can land on different instances and the sampling decision breaks. Use a routing/load-balancing exporter keyed by trace ID to keep a trace together.

Collector distributions

The Collector ships as several distributions with different sets of built-in components:

  • otelcol-core — minimal, just the core pipeline framework plus OTLP receiver/exporter.
  • otelcol-contrib — the "batteries included" build with hundreds of community receivers, processors, and exporters (Prometheus, Kafka, cloud vendor integrations, and more). Most teams start here.
  • Custom builds via OCB (OpenTelemetry Collector Builder) — compile your own distribution containing exactly the components you need, keeping the binary small and the attack surface minimal.
Takeaway

The Collector turns "every app exports directly to a vendor" into "every app exports to one local endpoint" — centralizing policy, batching, and vendor routing outside your application code.