A high-traffic service can generate billions of spans a day. Sampling decides which traces are worth keeping — trading completeness for cost, without losing the traces that actually matter.

Why sample at all

Storing and indexing every span from every request at scale is expensive, and most of that data is never looked at — nobody queries the trace for a successful, fast, boring request. Sampling reduces volume while trying to preserve the traces most likely to be useful: the slow ones, the failed ones, the statistically representative ones.

Head sampling

Head sampling makes the keep/drop decision at the very start of a trace — before any span knows how the request will turn out — typically at the SDK level, in the process that starts the root span. The decision then propagates: every downstream service respects the same sampled flag via the traceparent header, so a trace is either kept in full or dropped in full.

const sdk = new NodeSDK({
  sampler: new TraceIdRatioBasedSampler(0.1), // sample 10% of traces
  // ...
});

Common head samplers:

  • AlwaysOn / AlwaysOff — trivial, keep or drop everything.
  • TraceIdRatioBased — deterministic probabilistic sampling based on the trace ID, so the same trace ID always yields the same decision across services.
  • ParentBased — the default composite sampler: respects an existing sampling decision if a parent context exists, and only makes a fresh decision for root spans.
💡
Cheap but blind

Head sampling is fast and requires no coordination between services, but it can't see the future — a trace can be dropped moments before the request it belongs to fails, meaning your most interesting traces are sampled away at the same rate as boring ones.

Tail sampling

Tail sampling defers the decision until after a trace completes, when its full outcome is known — done in a Collector, since the deciding component needs to buffer all spans of a trace before deciding. This lets you write policies like "always keep traces with an error" or "always keep traces slower than 2 seconds," which head sampling structurally cannot do.

processors:
  tail_sampling:
    decision_wait: 10s
    policies:
      - name: keep-errors
        type: status_code
        status_code: { status_codes: [ERROR] }
      - name: keep-slow
        type: latency
        latency: { threshold_ms: 2000 }
      - name: sample-rest
        type: probabilistic
        probabilistic: { sampling_percentage: 5 }

Because a full trace's spans might arrive at the Collector out of order and from different services, tail sampling buffers spans for a configurable window (decision_wait) before evaluating policies and finalizing the decision.

Head vs. tail

Head samplingTail sampling
Decision pointAt trace start, in the SDKAfter trace completes, in the Collector
Can prioritize errors/slow tracesNo — outcome unknown yetYes — outcome is known
Resource costLow — drops early, minimal bufferingHigher — must buffer every span until a decision is made
Coordination neededNone beyond consistent trace ID hashingAll spans of a trace must reach the same Collector instance

Choosing a rate

  • Start conservative and measure. A common starting point is head sampling at 100% in low-traffic environments (staging, early production) and dialing down only once volume or cost demands it.
  • Combine strategies. A frequent production pattern: light head sampling to cap raw ingestion volume, plus tail sampling at the gateway layer to guarantee 100% retention of errors and slow requests regardless of the head rate.
  • Never sample metrics the same way as traces. Metrics are pre-aggregated and comparatively cheap; the sampling conversation is almost entirely about traces (and to a lesser extent, verbose debug logs).
🚨
Don't sample based on span content alone

Sampling decisions in OpenTelemetry are made per-trace, not per-span — a sampler can't selectively keep one span and drop a sibling within the same trace. If you need to reduce data volume within a kept trace (e.g. drop verbose debug spans), that's an attribute/filter processor job, not a sampler's.

Course recap

You've now covered the full arc: what OpenTelemetry is and why it exists, the three signals and how they correlate, the API/SDK/OTLP architecture, the anatomy of traces and metrics, how context survives a network hop, how instrumentation actually gets written, how the Collector routes and shapes data, why semantic conventions keep telemetry portable, and finally how sampling keeps cost under control without losing the traces that matter most.

Takeaway

Head sampling is cheap but outcome-blind; tail sampling is outcome-aware but costs more to run. Most production systems combine both — light head sampling to cap volume, tail sampling to guarantee errors and slow traces are never lost.