feat(tuned): migrate sysctl tunings to tuned #2082
Conversation
… GIDs - Move various sysctl parameters from setup-system.yml into the postgresql tuned profile. - Explicitly define GIDs for ssl-cert (1001) and postgres (1002) to ensure stable HugePages access. - Add HugePages calculation and hugetlb_shm_group configuration to the tuned profile. - Ensure gotrue.service waits for tuned.service before starting.
There was a problem hiding this comment.
Pull request overview
This PR migrates host kernel/sysctl tunings from direct Ansible sysctl tasks to a dedicated tuned profile, aiming to centralize and avoid conflicts while improving PostgreSQL-oriented performance and stability.
Changes:
- Updated PostgreSQL package release strings to
*-tuned-1variants. - Added/expanded
tunedprofile configuration for PostgreSQL, including a base profile include, Supabase-specific sysctls, and HugePages settings. - Removed direct sysctl tuning from
setup-system.yml, added deterministic GIDs for postgres-related groups innixpkg_mode, and ordered gotrue after tuned.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| ansible/vars.yml | Updates Postgres release identifiers to tuned-specific builds. |
| ansible/tasks/setup-tuned.yml | Creates/activates a tuned profile and writes sysctl/HugePages tuning into tuned.conf. |
| ansible/tasks/setup-system.yml | Removes direct sysctl configuration now intended to be handled by tuned. |
| ansible/tasks/setup-postgres.yml | Sets deterministic GIDs for ssl-cert and postgres groups in nixpkg_mode. |
| ansible/files/gotrue.service.j2 | Ensures gotrue starts after tuned to apply kernel/network tunings first. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
…-378 * 'INDATA-378' of github.com:supabase/postgres: Update vars.yml Update vars.yml revert: stop using conf.d directory got generated-optimizations (#2101)
|
If the goal is to encourage more active memory management, we may want to have 3 different profiles:
-- For all systems: Swap is an expensive operation, we avoid swap as much as possible, don’t let write backlog build up and flush continuously instead of in bursts. In practice, this often means a higher baseline IO, but less spikes. For systems with 64GB of memory or less: For systems with more than 64GB of memory: Get the kernel to starts background writeback early with a hard cap on dirty memory in RAM and avoid processes block until writes catch up. As we increase our available disk cache, we need to be more proactive on maintaining contiguous bytes for atomic operations, network and IO buffers. Reducing long-term fragmentation buildup. The compaction adds some CPU, but will help to ensure we do not encounter page allocation failures. -- vm.overcommit_memory = 2 Don't overcommit. The total address space commit for the system is not permitted to exceed swap + a configurable amount (default is 50%) of physical RAM. Without memory overcommit, Linux will return the error “out of memory” (ENOMEM). If a PostgreSQL process receives this error, the running statement fails with the error code 53200, but the database as a whole remains operational. I'm unclear if the other user proceses (Envoy, REST, Auth) deal with this error, but this is certainly better for the database. committable memory = swap + (RAM - vm.nr_hugepages * huge page size) * vm.overcommit_ratio / 100 Most of the instances have much more RAM than swap space, and we adjust the ratio into the 90% percentile so the majority is accessible. |
Definitely a +1 on this. OOMs are the most common, if not the most common, reason for a project to fail. When an overcommit limit is enforced, offending connections get killed with the error I'd argue that the setting should be configured automatically once someone gets to a medium. |
|
It looks like we previously based our decision on not disabling overcommit based on the micro instances? https://github.com/supabase/platform/issues/1404 For the low memory instances, back of the napkin: |
Summary of Changes
Ansible Task Refactoring (ansible/tasks/):
postgres,platform, andsaltrepos for existingsysctlcalls.dynamicallycalculate and set vm.nr_hugepages based on our default shared_buffers and configure vm.hugetlb_shm_group.Service Ordering (ansible/files/gotrue.service.j2):
Detailed Analysis of Sysctl Parameters
The following sysctl parameters are now being applied via tuned. These changes generally aim to optimize for a high-throughput database workload, improve network resilience, and prevent memory exhaustion issues.
vm.overcommit_memory = 2: Tells the kernel to never overcommit memory. This is a safer mode for dedicated database servers, ensuring the OOM killer is less likely to trigger unpredictably, though it requires careful sizing of Swap + RAM.(Calculated): Allocates explicit HugePages for PostgreSQL shared_buffers.Conclusion
These changes represent a move towards a "production-ready," high-performance configuration. The system is explicitly tuned for high throughput (via buffer/window increases), stability (via OOM panic policies), and reduced CPU overhead (via HugePages and NUMA settings). These settings were based on existing Supabase settings throughout the code, and the recommended tuning practices from Red Hat: PostgreSQL Load Tuning on RHEL (https://www.redhat.com/en/blog/postgresql-load-tuning-red-hat-enterprise-linux). This ensures that the OS is not just a general-purpose host, but is specifically optimized for the high-concurrency, high-I/O profile of a production PostgreSQL instance.