Skip to content

Metrics are not exported when a callback returns an error #8139

@kofuk

Description

@kofuk

Description

When using asynchronous instruments (Float64ObservableCounter, Int64ObservableCounter, Float64ObservableGauge, etc.) with WithFloat64Callback / WithInt64Callback, and one callback returns an error, all collected metric data for that collection cycle is silently dropped — including data successfully observed by other callbacks.

Environment

  • OS: Linux
  • Architecture: x86_64
  • Go Version: 1.26.1
  • opentelemetry-go version: v1.42.0

Steps To Reproduce

package main

import (
	"context"
	"errors"
	"go.opentelemetry.io/otel"
	"go.opentelemetry.io/otel/exporters/stdout/stdoutmetric"
	"go.opentelemetry.io/otel/sdk/metric"
)

func main() {
	ctx := context.Background()
	exporter, _ := stdoutmetric.New()
	mp := metric.NewMeterProvider(
		metric.WithReader(metric.NewPeriodicReader(exporter)),
	)
	defer mp.Shutdown(ctx)
	otel.SetMeterProvider(mp)

	meter := otel.Meter("example")

	meter.Float64ObservableCounter(
		"metric.one",
		metric.WithFloat64Callback(func(_ context.Context, o metric.Float64Observer) error {
			return errors.New("something went wrong") // this callback fails
		}),
	)

	meter.Float64ObservableCounter(
		"metric.two",
		metric.WithFloat64Callback(func(_ context.Context, o metric.Float64Observer) error {
			o.Observe(42.0) // this callback succeeds
			return nil
		}),
	)
}
  1. Run the above program.
  2. Observe that metric.two is not exported, even though its callback succeeded.

Expected behavior

When one callback returns an error, the error should be reported (e.g. logged via the global error handler), but metric data collected by other callbacks that succeeded should still be exported.

Additional information

The root cause is a mismatch in how errors are handled across the call chain collectAndExportcollectproduce.

pipeline.produce is intentionally designed to run all callbacks even if some return errors, accumulating errors with errors.Join, and writes collected data into rm unconditionally:

// sdk/metric/pipeline.go
var err error
for _, c := range p.callbacks {
    if e := c(ctx); e != nil {
        err = errors.Join(err, e) // loop continues
    }
}
// ... aggregated data is written into rm regardless of err ...
return err

However, collect propagates this error immediately without giving callers a chance to use the already-populated rm:

// sdk/metric/periodic_reader.go
err = ph.produce(ctx, rm)
if err != nil {
    return err // rm contains valid data, but caller won't see it
}

And collectAndExport skips export entirely when Collect returns a non-nil error:

// sdk/metric/periodic_reader.go
err := r.Collect(ctx, rm)
if err == nil {      // export is skipped when any callback failed
    err = r.export(ctx, rm)
}

As a result, produce's best-effort design — collecting as much data as possible — is made pointless by the layers above it.

A possible fix would be to always call export regardless of the error from Collect:

err := r.Collect(ctx, rm)
exportErr := r.export(ctx, rm)
return errors.Join(err, exportErr)

Tip: React with 👍 to help prioritize this issue. Please use comments to provide useful context, avoiding +1 or me too, to help us triage it. Learn more here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:metricsPart of OpenTelemetry MetricsbugSomething isn't workingpkg:SDKRelated to an SDK package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions