A high-traffic service can generate billions of spans a day. Sampling decides which traces are worth keeping — trading completeness for cost, without losing the traces that actually matter.
Why sample at all
Storing and indexing every span from every request at scale is expensive, and most of that data is never looked at — nobody queries the trace for a successful, fast, boring request. Sampling reduces volume while trying to preserve the traces most likely to be useful: the slow ones, the failed ones, the statistically representative ones.
Head sampling
Head sampling makes the keep/drop decision at the very start of a trace — before any span knows how the request will turn out — typically at the SDK level, in the process that starts the root span. The decision then propagates: every downstream service respects the same sampled flag via the traceparent header, so a trace is either kept in full or dropped in full.
const sdk = new NodeSDK({
sampler: new TraceIdRatioBasedSampler(0.1), // sample 10% of traces
// ...
});
Common head samplers:
- AlwaysOn / AlwaysOff — trivial, keep or drop everything.
- TraceIdRatioBased — deterministic probabilistic sampling based on the trace ID, so the same trace ID always yields the same decision across services.
- ParentBased — the default composite sampler: respects an existing sampling decision if a parent context exists, and only makes a fresh decision for root spans.
Head sampling is fast and requires no coordination between services, but it can't see the future — a trace can be dropped moments before the request it belongs to fails, meaning your most interesting traces are sampled away at the same rate as boring ones.
Tail sampling
Tail sampling defers the decision until after a trace completes, when its full outcome is known — done in a Collector, since the deciding component needs to buffer all spans of a trace before deciding. This lets you write policies like "always keep traces with an error" or "always keep traces slower than 2 seconds," which head sampling structurally cannot do.
processors:
tail_sampling:
decision_wait: 10s
policies:
- name: keep-errors
type: status_code
status_code: { status_codes: [ERROR] }
- name: keep-slow
type: latency
latency: { threshold_ms: 2000 }
- name: sample-rest
type: probabilistic
probabilistic: { sampling_percentage: 5 }
Because a full trace's spans might arrive at the Collector out of order and from different services, tail sampling buffers spans for a configurable window (decision_wait) before evaluating policies and finalizing the decision.
Head vs. tail
| Head sampling | Tail sampling | |
|---|---|---|
| Decision point | At trace start, in the SDK | After trace completes, in the Collector |
| Can prioritize errors/slow traces | No — outcome unknown yet | Yes — outcome is known |
| Resource cost | Low — drops early, minimal buffering | Higher — must buffer every span until a decision is made |
| Coordination needed | None beyond consistent trace ID hashing | All spans of a trace must reach the same Collector instance |
Choosing a rate
- Start conservative and measure. A common starting point is head sampling at 100% in low-traffic environments (staging, early production) and dialing down only once volume or cost demands it.
- Combine strategies. A frequent production pattern: light head sampling to cap raw ingestion volume, plus tail sampling at the gateway layer to guarantee 100% retention of errors and slow requests regardless of the head rate.
- Never sample metrics the same way as traces. Metrics are pre-aggregated and comparatively cheap; the sampling conversation is almost entirely about traces (and to a lesser extent, verbose debug logs).
Sampling decisions in OpenTelemetry are made per-trace, not per-span — a sampler can't selectively keep one span and drop a sibling within the same trace. If you need to reduce data volume within a kept trace (e.g. drop verbose debug spans), that's an attribute/filter processor job, not a sampler's.
Course recap
You've now covered the full arc: what OpenTelemetry is and why it exists, the three signals and how they correlate, the API/SDK/OTLP architecture, the anatomy of traces and metrics, how context survives a network hop, how instrumentation actually gets written, how the Collector routes and shapes data, why semantic conventions keep telemetry portable, and finally how sampling keeps cost under control without losing the traces that matter most.
Head sampling is cheap but outcome-blind; tail sampling is outcome-aware but costs more to run. Most production systems combine both — light head sampling to cap volume, tail sampling to guarantee errors and slow traces are never lost.