Skip to content

Per-thread FD tables via unshare(CLONE_FILES) for ET_NET threads#13086

Draft
c-taylor wants to merge 3 commits intoapache:masterfrom
c-taylor:unshare-clone-files
Draft

Per-thread FD tables via unshare(CLONE_FILES) for ET_NET threads#13086
c-taylor wants to merge 3 commits intoapache:masterfrom
c-taylor:unshare-clone-files

Conversation

@c-taylor
Copy link
Copy Markdown

Eliminate kernel spinlock contention on the shared files_struct by giving each ET_NET thread its own private FD table after all initialization FDs are in place. This removes the accept4/close contention visible as ~58% CPU in native_queued_spin_lock_slowpath at high thread counts.

This change is incompatible with MacOS and FreeBSD: The only equivalent solution there seems to be migrating to multi process. (This should be discussed as a long term goal)

Scheduled after start_HttpProxyServer() to ensure all persistent FDs (eventfds, cache disks, DNS sockets, log files, listen sockets, plugin FDs) are copied into each threads private table. Linux-only; non-fatal on failure (falls back to shared table).

I was also able to cause intermittent socket failures before applying this change.

Eliminate kernel spinlock contention on the shared files_struct by
giving each ET_NET thread its own private FD table after all
initialization FDs are in place. This removes the accept4/close
contention visible as ~58% CPU in native_queued_spin_lock_slowpath
at high thread counts.

This change is incompatible with MacOS and FreeBSD: The only equivalent
solution there seems to be migrating to multi process.
(This should be discussed as a long term goal)

Scheduled after start_HttpProxyServer() to ensure all persistent FDs
(eventfds, cache disks, DNS sockets, log files, listen sockets, plugin
FDs) are copied into each threads private table. Linux-only; non-fatal
on failure (falls back to shared table).

I was also able to cause intermittent socket failures before applying this
change.
@c-taylor
Copy link
Copy Markdown
Author

c-taylor commented Apr 13, 2026

I think this is ready to go, but would appreciate a confirmation from another source.

On 128thr systems I was able to move scaling from:
95,000 HS/sec (X25519), with socket errors
to
>400,000 HS/sec (X25519), no socket errors

Applying this change.

@c-taylor
Copy link
Copy Markdown
Author

Heading off at least one expected code etiquette question:
There is no such equivalent in other OS, so using the distinct syscall names for functions inside the idef, for me was much clearer as intent.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to reduce Linux kernel contention on the shared files_struct by giving each ET_NET thread its own FD table via unshare(CLONE_FILES) after server initialization, targeting accept/close spinlock hot paths at high thread counts.

Changes:

  • Schedule unshare_et_net_fd_tables() after start_HttpProxyServer() in both the delayed-listen and non-delayed startup paths (Linux-only).
  • Implement unshare_et_net_fd_tables() to schedule per-ET_NET thread unshare(CLONE_FILES) work on Linux.
  • Expose the Linux-only unshare_et_net_fd_tables() declaration via P_UnixNet.h.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/traffic_server/traffic_server.cc Calls unshare_et_net_fd_tables() after start_HttpProxyServer() (Linux-only).
src/iocore/net/UnixNet.cc Adds Linux implementation that schedules unshare(CLONE_FILES) on each ET_NET thread.
src/iocore/net/P_UnixNet.h Adds Linux-only prototype for unshare_et_net_fd_tables().

Comment thread src/traffic_server/traffic_server.cc Outdated
Comment thread src/iocore/net/UnixNet.cc Outdated
Comment thread src/iocore/net/UnixNet.cc
Comment thread src/traffic_server/traffic_server.cc Outdated
@bryancall bryancall requested review from cmcfarlen and masaori335 and removed request for cmcfarlen and masaori335 April 13, 2026 20:15
Rename to ExecThrLateCont/exec_thr_late_init() to create a generic
late-init hook for ET_NET threads. Move ifdef to only wrap the
unshare() call so the continuation compiles on all platforms.

Skip unshare when accept_threads > 0 because dedicated accept threads
create socket FDs that are handed off to ET_NET threads by FD number.
After unshare those FDs would not exist in the ET_NET threads private
table, causing EBADF on epoll_ctl.
@c-taylor
Copy link
Copy Markdown
Author

The concern here would be mutating FDs that change over lifetime of the process.
Arbitrary plugin usage perhaps being the largest risk.

Whilst this 'may' work in the general case, it might need some accompaniment of additional features to encourage the correct behaviour.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

Comment thread src/iocore/net/UnixNet.cc Outdated
Comment thread src/iocore/net/UnixNet.cc Outdated
Comment thread src/traffic_server/traffic_server.cc Outdated
@masaori335
Copy link
Copy Markdown
Contributor

[approve ci autest]

Only call unshare(CLONE_FILES) when per-thread listen is enabled
(exec_thread.listen=1, SO_REUSEPORT). Schedule before
start_HttpProxyServer() so accept_per_thread creates listen FDs
directly in each threads private table. Demote all logging to Dbg
to avoid log spam when unshare is blocked (EPERM).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

3 participants