Skip to content

Fix repair_metadata OOM on large repositories#1189

Merged
jobselko merged 1 commit intopulp:mainfrom
decko:fix/repair-metadata-memory
Apr 14, 2026
Merged

Fix repair_metadata OOM on large repositories#1189
jobselko merged 1 commit intopulp:mainfrom
decko:fix/repair-metadata-memory

Conversation

@decko
Copy link
Copy Markdown
Member

@decko decko commented Apr 9, 2026

Summary

  • Reduce BULK_SIZE from 1000 to 250, flushing batches 4x more often to cap peak memory
  • Eliminate double S3 read per wheel by reusing the temp file from metadata extraction for metadata artifact creation
  • Explicitly close artifact file handles after each iteration to release S3 buffer memory

Fixes #1188

Test plan

  • Existing test_repair.py tests pass (metadata repair command, endpoint, artifact repair)
  • New test_metadata_repair_batch_boundary passes with reduced BULK_SIZE
  • Deploy to stage and run repair-python-metadata.py --env stage --domain <large-domain> to verify no OOM

JIRA: PULP-1573

🤖 Generated with Claude Code

@github-actions github-actions bot added multi-commit Add to bypass single commit lint check no-changelog no-issue labels Apr 9, 2026
Comment thread pulp_python/app/tasks/repair.py Outdated
Comment thread pulp_python/app/utils.py Outdated
Comment thread pulp_python/app/utils.py Outdated
Comment thread pulp_python/tests/functional/api/test_repair.py Outdated
Comment thread pulp_python/app/utils.py
@jobselko
Copy link
Copy Markdown
Contributor

Could you please squash the commits into one? Otherwise, it looks good!

@decko decko force-pushed the fix/repair-metadata-memory branch from 025a2f3 to 9cda1db Compare April 13, 2026 16:17
@github-actions github-actions bot removed multi-commit Add to bypass single commit lint check no-changelog labels Apr 13, 2026
@decko
Copy link
Copy Markdown
Member Author

decko commented Apr 13, 2026

Could you please squash the commits into one? Otherwise, it looks good!

Done @jobselko. Just one commit.

Copy link
Copy Markdown
Contributor

@jobselko jobselko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I will keep this open, as @gerrod3 might want to review it as well. There are a few things that need small refactoring, but these are outside the scope of this PR.

Copy link
Copy Markdown
Contributor

@gerrod3 gerrod3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can accept this change and backport it. In the future we will need to take a look at our util methods and refactor them, they've grown a bit absurd.

Comment thread CHANGES/1188.bugfix Outdated
Comment on lines +1 to +3
Reduced peak memory consumption of repair_metadata by lowering batch size from 1000 to 250,
eliminating double S3 reads for wheel files, and closing artifact file handles after each
iteration. This fixes "Worker has gone missing" errors on repositories with 1000+ packages.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify this, too verbose. Keep it on one line.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplified to one line.

Large repositories (1000+ packages) cause workers to OOM during
repair_metadata. Three changes reduce peak memory:

- Reduce BULK_SIZE from 1000 to 250, flushing batches 4x more often
- Copy artifact to temp file once via helper, reuse for both content
  data extraction and metadata extraction (eliminates double S3 read)
- Extract metadata bytes while temp file exists, pass bytes through
  the metadata batch instead of file paths

Closes pulp#1188
JIRA: PULP-1573

Assisted-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@decko decko force-pushed the fix/repair-metadata-memory branch from 9cda1db to 208ab04 Compare April 13, 2026 19:57
@decko decko requested a review from gerrod3 April 13, 2026 19:58
@jobselko jobselko merged commit 7aaf53f into pulp:main Apr 14, 2026
13 of 14 checks passed
@patchback
Copy link
Copy Markdown

patchback bot commented Apr 14, 2026

Backport to 3.27: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.27/7aaf53fc2f2a46f21be5363e677f503d6ea86fc9/pr-1189

Backported as #1196

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Apr 14, 2026
Fix repair_metadata OOM on large repositories

(cherry picked from commit 7aaf53f)
@patchback
Copy link
Copy Markdown

patchback bot commented Apr 14, 2026

Backport to 3.28: 💚 backport PR created

✅ Backport PR branch: patchback/backports/3.28/7aaf53fc2f2a46f21be5363e677f503d6ea86fc9/pr-1189

Backported as #1197

🤖 @patchback
I'm built with octomachinery and
my source is open — https://github.com/sanitizers/patchback-github-app.

patchback bot pushed a commit that referenced this pull request Apr 14, 2026
Fix repair_metadata OOM on large repositories

(cherry picked from commit 7aaf53f)
jobselko added a commit that referenced this pull request Apr 14, 2026
…f2a46f21be5363e677f503d6ea86fc9/pr-1189

[PR #1189/7aaf53fc backport][3.27] Fix repair_metadata OOM on large repositories
jobselko added a commit that referenced this pull request Apr 14, 2026
…f2a46f21be5363e677f503d6ea86fc9/pr-1189

[PR #1189/7aaf53fc backport][3.28] Fix repair_metadata OOM on large repositories
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

repair_metadata OOMs on large repositories (1000+ packages)

3 participants