Skip to content

feat: Add Prometheus counters for token exchange#28453

Open
BGZStephen wants to merge 3 commits intomasterfrom
IAM-475-monitoring-and-alerting-for-token-exchange-operations
Open

feat: Add Prometheus counters for token exchange#28453
BGZStephen wants to merge 3 commits intomasterfrom
IAM-475-monitoring-and-alerting-for-token-exchange-operations

Conversation

@BGZStephen
Copy link
Copy Markdown
Contributor

Summary

Adds Prometheus counters for token exchange and embed login operations so operators can monitor the health of these authentication flows — detecting key misconfiguration, replay attacks, and JIT provisioning bursts without digging through logs.

New metrics (always registered when N8N_METRICS=true)

Metric Labels What it tracks
n8n_token_exchange_requests_total result: success|failure Overall exchange success/failure rate
n8n_token_exchange_failures_total reason: <code> Failure breakdown by cause
n8n_embed_login_requests_total result: success|failure Embed login success/failure rate
n8n_embed_login_failures_total reason: <code> Embed login failure breakdown by cause
n8n_token_exchange_jit_provisioning_total Users JIT-provisioned via token exchange
n8n_token_exchange_identity_linked_total External identities linked to existing users

Failure reason labels are stable codes normalised from error messages (invalid_signature, unknown_key, token_replay, token_too_long, token_near_expiry, invalid_format, missing_kid, missing_iss, invalid_claims, internal_error, role_not_allowed, other) — dashboards won't break if error message text changes.

Also: embed login failure visibility

The embed auth controller previously let errors propagate silently (no event emitted, no metric). This PR wraps handleLogin() in a try/catch that emits a new embed-login-failed event before re-throwing, closing the monitoring blind spot. The event is also wired into the log-streaming relay and audit event registry alongside the existing token exchange events.

How to test

# Start n8n with metrics enabled
N8N_METRICS=true n8n start

# Scrape metrics endpoint
curl http://localhost:5678/metrics | grep -E "token_exchange|embed_login"

# Expected output (all 6 counter families at 0 before any requests):
# n8n_token_exchange_requests_total{result="success"} 0
# n8n_token_exchange_requests_total{result="failure"} 0
# n8n_embed_login_requests_total{result="success"} 0
# n8n_embed_login_requests_total{result="failure"} 0
# n8n_token_exchange_jit_provisioning_total 0
# n8n_token_exchange_identity_linked_total 0

# After a failed token exchange attempt, verify labelled failure counters appear:
# n8n_token_exchange_failures_total{reason="unknown_key"} 1

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/IAM-475

Tests

Unit tests added in packages/cli/src/metrics/__tests__/prometheus-metrics.service.test.ts covering:

  • All 6 counters are registered on init() (unconditional, no config flag required)
  • result label combos (success/failure) are pre-seeded at 0 on startup
  • token-exchange-succeeded → increments success counter
  • token-exchange-failed → increments failure counter + maps error message to normalized reason label
  • Unknown failure reason falls through to 'other' (cardinality safety)
  • Role-related error strings ('not allowed', 'Unrecognized role', 'Cannot provision') map to 'role_not_allowed'
  • embed-login → increments embed login success counter
  • embed-login-failed → increments embed login failure counter + normalizes reason
  • token-exchange-user-provisioned → increments JIT provisioning counter
  • token-exchange-identity-linked → increments identity-linked counter

Embed controller test updated: failure path now asserts embed-login-failed is emitted and embed-login (success event) is not.

Review / Merge checklist

  • I have seen this code, I have run this code, and I take responsibility for this code.
  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with Backport to Beta, Backport to Stable, or Backport to v1 (if the PR is an urgent fix that needs to be backported)

…ogin

Exposes six counters under the /metrics endpoint when N8N_METRICS=true:

- n8n_token_exchange_requests_total{result} — success/failure rate
- n8n_token_exchange_failures_total{reason} — failure breakdown by reason
- n8n_embed_login_requests_total{result}    — embed login success/failure rate
- n8n_embed_login_failures_total{reason}   — embed login failure breakdown
- n8n_token_exchange_jit_provisioning_total — JIT-provisioned user count
- n8n_token_exchange_identity_linked_total  — identity-linking event count

Failure reasons are normalised to stable labels (invalid_signature,
unknown_key, token_replay, etc.) so dashboards are not broken by future
error message changes.

Also closes the embed-login failure monitoring blind spot: adds an
embed-login-failed event that is emitted before re-throwing, wired into
the log-streaming relay and audit event registry alongside the existing
token exchange events.
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 93.75000% with 3 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...cli/src/events/relays/log-streaming.event-relay.ts 0.00% 3 Missing ⚠️

📢 Thoughts on this report? Let us know!

Extract private counter accesses to named consts so the directive
immediately precedes the suppressed line, as required by TypeScript.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Apr 14, 2026

Performance Comparison

Comparing currentlatest master14-day baseline

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-runners 393.00 MB 393.00 MB 391.63 MB (σ 11.06) +0.0% +0.3%
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-heap-used-baseline 186.75 MB 186.52 MB 186.34 MB (σ 0.24) +0.1% +0.2% ⚠️
instance-ai-rss-baseline 352.57 MB 388.20 MB 372.63 MB (σ 22.95) -9.2% -5.4%

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-heap-used-baseline 114.18 MB 114.05 MB 113.86 MB (σ 0.84) +0.1% +0.3%
memory-rss-baseline 283.12 MB 287.98 MB 284.98 MB (σ 42.51) -1.7% -0.7%
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

@n8n-assistant n8n-assistant bot added core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team labels Apr 14, 2026
@BGZStephen BGZStephen marked this pull request as ready for review April 14, 2026 09:33
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 issue found across 7 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="packages/cli/src/metrics/prometheus-metrics.service.ts">

<violation number="1" location="packages/cli/src/metrics/prometheus-metrics.service.ts:715">
P2: Token-exchange metric listeners are re-registered on every `init()` call, which can double-count events after reinitialization.</violation>
</file>
Architecture diagram
sequenceDiagram
    participant Client as Client Browser
    participant Ctrl as EmbedAuthController
    participant Svc as TokenExchangeService
    participant Events as EventService (Local)
    participant Metrics as PrometheusMetricsService
    participant Audit as LogStreamingEventRelay
    participant Prom as Prometheus Server

    Note over Metrics,Events: Initialization (N8N_METRICS=true)
    Metrics->>Events: Register listeners for token-exchange and embed-login events
    Metrics->>Metrics: NEW: Initialize 6 counters (pre-seed success/failure labels at 0)

    Note over Client,Audit: Request Flow: Embed Login
    Client->>Ctrl: GET /embed/login?token=...
    Ctrl->>Svc: embedLogin(subjectToken)
    
    alt Success Path
        Svc-->>Ctrl: User Identity
        Ctrl->>Events: emit('embed-login')
        Events-->>Metrics: Trigger handler
        Metrics->>Metrics: inc(n8n_embed_login_requests_total{result:success})
        Ctrl-->>Client: 302 Redirect + Auth Cookie
    else CHANGED: Failure Path
        Svc-->>Ctrl: Throw Error (e.g. "Unknown key id")
        Ctrl->>Events: NEW: emit('embed-login-failed', { failureReason })
        
        par Async Metrics Update
            Events-->>Metrics: Trigger handler
            Metrics->>Metrics: inc(n8n_embed_login_requests_total{result:failure})
            Metrics->>Metrics: NEW: normalizeFailureReason(reason)
            Metrics->>Metrics: inc(n8n_embed_login_failures_total{reason:unknown_key})
        and Async Audit Log
            Events-->>Audit: Trigger relay
            Audit->>Audit: NEW: embedLoginFailed()
            Note right of Audit: Emits n8n.audit.token-exchange.embed-login-failed
        end
        
        Ctrl-->>Client: 500 / Error Response
    end

    Note over Client,Metrics: Background: Metric Scraping
    Prom->>Metrics: GET /metrics
    Metrics-->>Prom: Return counters (Requests, Failures, JIT, Linked)
Loading

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review, or fix all with cubic.

@BGZStephen BGZStephen enabled auto-merge April 14, 2026 10:53
@BGZStephen BGZStephen requested review from a team, afitzek, cstuncsik, guillaumejacquart and phyllis-noester and removed request for a team April 14, 2026 13:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Enhancement outside /nodes-base and /editor-ui n8n team Authored by the n8n team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant