Per-thread FD tables via unshare(CLONE_FILES) for ET_NET threads#13086
Per-thread FD tables via unshare(CLONE_FILES) for ET_NET threads#13086c-taylor wants to merge 3 commits intoapache:masterfrom
Conversation
Eliminate kernel spinlock contention on the shared files_struct by giving each ET_NET thread its own private FD table after all initialization FDs are in place. This removes the accept4/close contention visible as ~58% CPU in native_queued_spin_lock_slowpath at high thread counts. This change is incompatible with MacOS and FreeBSD: The only equivalent solution there seems to be migrating to multi process. (This should be discussed as a long term goal) Scheduled after start_HttpProxyServer() to ensure all persistent FDs (eventfds, cache disks, DNS sockets, log files, listen sockets, plugin FDs) are copied into each threads private table. Linux-only; non-fatal on failure (falls back to shared table). I was also able to cause intermittent socket failures before applying this change.
|
I think this is ready to go, but would appreciate a confirmation from another source. On 128thr systems I was able to move scaling from: Applying this change. |
|
Heading off at least one expected code etiquette question: |
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce Linux kernel contention on the shared files_struct by giving each ET_NET thread its own FD table via unshare(CLONE_FILES) after server initialization, targeting accept/close spinlock hot paths at high thread counts.
Changes:
- Schedule
unshare_et_net_fd_tables()afterstart_HttpProxyServer()in both the delayed-listen and non-delayed startup paths (Linux-only). - Implement
unshare_et_net_fd_tables()to schedule per-ET_NETthreadunshare(CLONE_FILES)work on Linux. - Expose the Linux-only
unshare_et_net_fd_tables()declaration viaP_UnixNet.h.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/traffic_server/traffic_server.cc | Calls unshare_et_net_fd_tables() after start_HttpProxyServer() (Linux-only). |
| src/iocore/net/UnixNet.cc | Adds Linux implementation that schedules unshare(CLONE_FILES) on each ET_NET thread. |
| src/iocore/net/P_UnixNet.h | Adds Linux-only prototype for unshare_et_net_fd_tables(). |
Rename to ExecThrLateCont/exec_thr_late_init() to create a generic late-init hook for ET_NET threads. Move ifdef to only wrap the unshare() call so the continuation compiles on all platforms. Skip unshare when accept_threads > 0 because dedicated accept threads create socket FDs that are handed off to ET_NET threads by FD number. After unshare those FDs would not exist in the ET_NET threads private table, causing EBADF on epoll_ctl.
|
The concern here would be mutating FDs that change over lifetime of the process. Whilst this 'may' work in the general case, it might need some accompaniment of additional features to encourage the correct behaviour. |
|
[approve ci autest] |
Only call unshare(CLONE_FILES) when per-thread listen is enabled (exec_thread.listen=1, SO_REUSEPORT). Schedule before start_HttpProxyServer() so accept_per_thread creates listen FDs directly in each threads private table. Demote all logging to Dbg to avoid log spam when unshare is blocked (EPERM).
Eliminate kernel spinlock contention on the shared files_struct by giving each ET_NET thread its own private FD table after all initialization FDs are in place. This removes the accept4/close contention visible as ~58% CPU in native_queued_spin_lock_slowpath at high thread counts.
This change is incompatible with MacOS and FreeBSD: The only equivalent solution there seems to be migrating to multi process. (This should be discussed as a long term goal)
Scheduled after start_HttpProxyServer() to ensure all persistent FDs (eventfds, cache disks, DNS sockets, log files, listen sockets, plugin FDs) are copied into each threads private table. Linux-only; non-fatal on failure (falls back to shared table).
I was also able to cause intermittent socket failures before applying this change.