Skip to content

fix(Gmail Node): Add V2 maxResults limit to prevent OOM on large inboxes#28470

Open
BerniWittmann wants to merge 1 commit intomasterfrom
node-4595-gmail-oom-on-cloud-v2
Open

fix(Gmail Node): Add V2 maxResults limit to prevent OOM on large inboxes#28470
BerniWittmann wants to merge 1 commit intomasterfrom
node-4595-gmail-oom-on-cloud-v2

Conversation

@BerniWittmann
Copy link
Copy Markdown
Member

@BerniWittmann BerniWittmann commented Apr 14, 2026

Summary

The Gmail Trigger causes OOM on Cloud when polling many or large emails. Root causes:

  1. Gmail list API can return up to 100 message IDs, then each is fetched as a full message (format=raw can be 10MB+ per email)
  2. Full Message objects were kept in memory just for timestamp/duplicate detection
  3. No user control over how many emails to fetch per poll

This PR adds node version 2 with a configurable "Max Emails per Poll" field (default 10, max 50). The approach: list all IDs (cheap), limit full fetches (expensive), store unfetched IDs as pendingMessageIds for the next poll cycle.

Key changes:

  • maxResults property gated behind @version >= 2, disabled in manual mode
  • MessageBookkeeping type — lightweight {id, internalDate, date, headers.date} for timestamp/dedup computation without retaining full message bodies
  • pendingMessageIds state — unfetched IDs stored for next poll (when # of emails > maxResults), drained before listing new messages
  • Duplicate fixpossibleDuplicates are now merged (not replaced) when lastTimeChecked doesn't advance during pending drain, preventing boundary duplicates from reappearing
  • Timestamp flooringMath.floor() on lastTimeChecked to prevent float precision issues in equality comparisons
  • 24 tests (12 existing v1 pinned to 1.3, 12 new v2) covering: budget limiting, pending drain, multi-poll cycles, same-timestamp edge cases, draft filtering, early-return paths

How to test:

  1. Create a Gmail Trigger v2 workflow with maxResults=2
  2. Send 3+ emails to the monitored account
  3. Verify: Poll 1 returns 2 emails, Poll 2 returns remaining, Poll 3 returns nothing (no duplicates)

Related Linear tickets, Github issues, and Community forum posts

https://linear.app/n8n/issue/NODE-4595

Review / Merge checklist

  • I have seen this code, I have run this code, and I take responsibility for this code.
  • PR title and summary are descriptive. (conventions)
  • Docs updated or follow-up ticket created.
  • Tests included.
  • PR Labeled with Backport to Beta, Backport to Stable, or Backport to v1 (if the PR is an urgent fix that needs to be backported)

🤖 PR Summary generated by AI

- Add node version 2 with configurable "Max Emails per Poll" (default 10, max 50)
- Store unfetched message IDs as pendingMessageIds for next poll
- Use lightweight MessageBookkeeping type instead of full Message for state tracking
- Fix possibleDuplicates loss during pending drain by merging when lastTimeChecked doesn't advance
- Floor timestamps to integers to prevent float comparison issues
- Add comprehensive V2 test coverage (12 new tests)
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 93.05556% with 5 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
...nodes-base/nodes/Google/Gmail/GmailTrigger.node.ts 93.05% 1 Missing and 4 partials ⚠️

📢 Thoughts on this report? Let us know!

@BerniWittmann BerniWittmann marked this pull request as ready for review April 14, 2026 11:56
@Joffcom Joffcom requested a review from RomanDavydchuk April 14, 2026 11:57
Copy link
Copy Markdown
Contributor

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 3 files

Architecture diagram
sequenceDiagram
    participant SD as Static Data (Storage)
    participant Node as Gmail Trigger Node
    participant GAPI as Gmail API (External)

    Note over Node: NEW: Version 2 Polling Logic

    Node->>SD: Retrieve lastTimeChecked & pendingMessageIds
    SD-->>Node: lastTimeChecked, pendingMessageIds[]

    alt NEW: Pending messages from previous poll exist
        Node->>Node: Calculate budget (maxResults)
        Node->>Node: Slice first N IDs from pendingMessageIds
        
        loop For each ID in slice
            Node->>GAPI: GET /messages/{id} (fetch full body/metadata)
            GAPI-->>Node: Message data
            Node->>Node: NEW: Extract MessageBookkeeping (lightweight metadata)
        end
        
        Node->>SD: NEW: Update pendingMessageIds (remove fetched)
        
        opt NEW: pendingMessageIds still not empty
            Node-->>Node: Early Exit: Return items to workflow
        end
    end

    Note over Node: List New Messages (if budget remains)

    Node->>GAPI: GET /messages?q=after:{lastTimeChecked}
    GAPI-->>Node: List of {id, threadId}

    Node->>Node: CHANGED: Filter IDs against possibleDuplicates

    alt NEW: Listed count > remaining budget
        Node->>Node: Slice IDs to fit budget
        Node->>SD: NEW: Store overflow in pendingMessageIds
    end

    loop For each message in budget
        Node->>GAPI: GET /messages/{id}
        GAPI-->>Node: Full Message
        
        Note right of Node: OOM Prevention: Only keep <br/>Bookkeeping objects in memory <br/>for date calculation
        Node->>Node: NEW: push to allFetchedMessages (id, internalDate)
    end

    Note over Node, SD: State Update & Deduplication logic

    Node->>Node: CHANGED: Calculate lastEmailDate (Math.floor)
    
    alt NEW: lastTimeChecked did not advance
        Node->>SD: CHANGED: Merge current possibleDuplicates with new IDs
    else lastTimeChecked advanced
        Node->>SD: Replace possibleDuplicates with new IDs
    end

    Node->>SD: Update lastTimeChecked
    Node-->>Node: Return processed responseData
Loading

@github-actions
Copy link
Copy Markdown
Contributor

Performance Comparison

Comparing currentlatest master14-day baseline

Memory consumption baseline with starter plan resources

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
memory-heap-used-baseline 114.19 MB 114.05 MB 113.86 MB (σ 0.84) +0.1% +0.3%
memory-rss-baseline 284.62 MB 287.98 MB 284.98 MB (σ 42.51) -1.2% -0.1%

docker-stats

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
docker-image-size-runners 393.00 MB 393.00 MB 391.63 MB (σ 11.06) +0.0% +0.3%
docker-image-size-n8n 1269.76 MB 1269.76 MB 1269.76 MB (σ 0.00) +0.0% +0.0%

Idle baseline with Instance AI module loaded

Metric Current Latest Master Baseline (avg) vs Master vs Baseline Status
instance-ai-rss-baseline 345.76 MB 388.20 MB 372.63 MB (σ 22.95) -10.9% -7.2% ⚠️
instance-ai-heap-used-baseline 187.04 MB 186.52 MB 186.34 MB (σ 0.24) +0.3% +0.4% 🔴
How to read this table
  • Current: This PR's value (or latest master if PR perf tests haven't run)
  • Latest Master: Most recent nightly master measurement
  • Baseline: Rolling 14-day average from master
  • vs Master: PR impact (current vs latest master)
  • vs Baseline: Drift from baseline (current vs rolling avg)
  • Status: ✅ within 1σ | ⚠️ 1-2σ | 🔴 >2σ regression

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant