Skip to content

fix: segfault when moving scoped_ostream_redirect#6033

Merged
rwgk merged 6 commits intopybind:masterfrom
kounelisagis:fix/scoped-ostream-redirect-move-segfault
Apr 16, 2026
Merged

fix: segfault when moving scoped_ostream_redirect#6033
rwgk merged 6 commits intopybind:masterfrom
kounelisagis:fix/scoped-ostream-redirect-move-segfault

Conversation

@kounelisagis
Copy link
Copy Markdown
Contributor

@kounelisagis kounelisagis commented Apr 8, 2026

Cursor GPT-5.4 Extra High Fast generated:


Summary

Fix the move semantics of py::scoped_ostream_redirect.

Moving a redirect has to transfer two kinds of state together:

  1. redirect ownership: which object is responsible for restoring the original rdbuf()
  2. buffered output state: any bytes already staged inside detail::pythonbuf

The destination object must become the only active redirect, the stream must be re-pointed at the destination buffer, and any already-buffered bytes must move with it. If any of that state stays behind in the moved-from object, later writes or destruction can still touch stale moved-from state and crash.

This PR makes that transfer explicit and internally consistent.

What changes

  • scoped_ostream_redirect now uses an explicit move constructor that rebinds the stream to the destination pythonbuf.
  • Redirect ownership is tracked with an active flag, so only the live redirect restores the original rdbuf() in its destructor.
  • active replaces the old nullptr-sentinel style of representing moved-from state, which also allows a real nullptr original rdbuf() to be handled correctly.
  • detail::pythonbuf now has an explicit move constructor that rebuilds the destination put area from the transferred storage and clears the source put area.
  • This ensures that buffered-but-unflushed output follows the active redirect instead of being flushed later through moved-from Python handles.

Tests

Add regression coverage for:

  • moving a redirect after a flushed write
  • moving a redirect with buffered, unflushed output already pending
  • moving a redirect when the original stream rdbuf() is nullptr
  • restoring a nullptr rdbuf() after destruction

Together these tests cover the final move contract: stream rebinding, moved-from destructor disarming, preservation of pending buffered data, and correct handling of nullptr as a legitimate original stream buffer.

Suggested changelog entry

  • Fixed scoped_ostream_redirect move semantics so moved redirects preserve buffered output, restore the original stream buffer exactly once, and correctly handle streams whose original rdbuf() is nullptr.

The default move constructor left the stream (`std::cout`) pointing at
the moved-from `pythonbuf`, whose internal buffer and streambuf pointers
were nulled by the move. Any subsequent write through the stream
dereferenced null, causing a segfault.

Replace `= default` with an explicit move constructor that re-points
the stream to the new buffer and disarms the moved-from destructor.
@kounelisagis kounelisagis force-pushed the fix/scoped-ostream-redirect-move-segfault branch from 2b243d9 to 156332c Compare April 8, 2026 23:46
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 12, 2026

@kounelisagis Please see below. I'm using Cursor GPT-5.4 Extra High Fast. I could easily ask it to first add the a test for the old == nullptr regression, and then make your new test pass without breaking that one. Do you want to do that on your end, or do you prefer if I try that?


Initial Review Prompt

Could you please review very carefully? I'm not familiar with the py::scoped_ostream_redirect code. I want to rely on your review. Could you please flag anything you're uncertain about?

Findings

  • High: include/pybind11/iostream.h now uses old == nullptr as the moved-from sentinel, but nullptr is also a valid previous rdbuf() value for a real std::ostream. That means a redirect created on something like std::ostream os(nullptr) will no longer restore the original nullptr buffer in the destructor, and the stream is left pointing at a destroyed pythonbuf.
    explicit scoped_ostream_redirect(std::ostream &costream = std::cout,
                                     const object &pyostream
                                     = module_::import("sys").attr("stdout"))
        : costream(costream), buffer(pyostream) {
        old = costream.rdbuf(&buffer);
    }

    ~scoped_ostream_redirect() {
        if (old) {
            costream.rdbuf(old);
        }
    }

    scoped_ostream_redirect(const scoped_ostream_redirect &) = delete;
    scoped_ostream_redirect(scoped_ostream_redirect &&other) noexcept
        : old(other.old), costream(other.costream), buffer(std::move(other.buffer)) {
        other.old = nullptr;     // Disarm moved-from destructor
        costream.rdbuf(&buffer); // Re-point stream to our buffer
    }

The old destructor restored old unconditionally, which correctly handled old == nullptr. After this patch, “moved-from” and “original buffer was null” are conflated. I sanity-checked locally that both std::ostream os(nullptr) and restoring rdbuf(nullptr) are valid on libstdc++, so this is a real regression, not just a theoretical one. I would fix this with a separate ownership flag such as active / owns_redirection, and still always restore old when that flag is set.

  • Low: the new regression test only exercises the happy-path std::cout move with a normal existing buffer and a pre-flush before the move, so it would not catch the old == nullptr regression above or other state-management edge cases in this code path.
    m.def("move_redirect_output", [](const std::string &msg) {
        py::scoped_ostream_redirect redir1(std::cout, py::module_::import("sys").attr("stdout"));
        std::cout << "before" << std::flush;
        py::scoped_ostream_redirect redir2(std::move(redir1));
        std::cout << msg << std::flush;
    });
def test_move_redirect(capsys):
    m.move_redirect_output("after")
    stdout, stderr = capsys.readouterr()
    assert stdout == "beforeafter"
    assert not stderr

Open Questions

  • I’m not confident nested redirects on the same stream are safe to move in general. Each guard stores a raw old pointer into another buffer, so moving a non-top-most redirect still looks fragile even if the top-of-stack move case is fixed.
  • I’m also not fully sure the move path preserves pending buffered bytes on every supported STL. The test flushes "before" before the move, so it proves “next write doesn’t crash,” but not “unflushed buffered data survives a move.”
  • I did not try to treat the new noexcept as a blocker, but I also did not independently prove the whole move path is non-throwing on every supported standard library implementation.

Overall

I would not merge this as-is because of the old == nullptr regression in scoped_ostream_redirect ownership/restoration logic.

Aside from that, the main idea of the fix looks right: rebinding the stream to the moved-to pythonbuf and disarming the moved-from destructor is the correct direction, and I didn’t find another concrete correctness bug in the simple top-level move case this PR is targeting.

Comment thread include/pybind11/iostream.h
Using `old == nullptr` as the moved-from sentinel was incorrect because
nullptr is a valid original rdbuf() value (e.g. `std::ostream os(nullptr)`).
Replace with an explicit `active` flag so the destructor correctly
restores nullptr buffers.

Add tests for the nullptr-rdbuf edge case.
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 14, 2026

@kounelisagis please see below. This is still using Cursor GPT-5.4 Extra High Fast. What do you think about the "I would be more comfortable with something like" code? Maybe clang-tidy doesn't demand the noexcept with the suggested code? Could you please try?


Prompt

I'm wondering about exception safety in this code:

scoped_ostream_redirect(scoped_ostream_redirect &&other) noexcept
    : old(other.old), costream(other.costream), buffer(std::move(other.buffer)) {
    other.active = false;    // Disarm moved-from destructor
    costream.rdbuf(&buffer); // Re-point stream to our buffer
}

Does the noexcept make this universally safe?

Or do we have to worry about internal consistency between other.active and other other members, or even this members?

Response

Findings

  • noexcept does not make this universally safe. It only means that if anything in the move path throws, the program terminates instead of letting the exception propagate. I checked locally on GCC 13 / libstdc++ 13, and pybind11::detail::pythonbuf is not std::is_nothrow_move_constructible.
  • There is also a separate logical-state issue in the latest commit: this.active is always default-initialized to true, instead of being derived from other.active. So moving an already-moved-from redirect would incorrectly try to rebind the stream again.
    scoped_ostream_redirect(scoped_ostream_redirect &&other) noexcept
        : old(other.old), costream(other.costream), buffer(std::move(other.buffer)) {
        other.active = false;    // Disarm moved-from destructor
        costream.rdbuf(&buffer); // Re-point stream to our buffer
    }

What That Means

noexcept is only “safe” here in the narrow sense that you won’t continue running after an exception escapes this constructor. It is not a proof that the operations inside are non-throwing.

There are two separate concerns:

First, buffer(std::move(other.buffer)) is not universally non-throwing. pybind11::object itself is noexcept-movable:

    /// Move constructor; steals the object from ``other`` and preserves its reference count
    object(object &&other) noexcept : handle(other) { other.m_ptr = nullptr; }

but pythonbuf is derived from std::streambuf, and on this libstdc++ its defaulted move constructor is not treated as nothrow.

Second, costream.rdbuf(&buffer) is not specified as noexcept. On this libstdc++ implementation, basic_ios::rdbuf(newbuf) updates the stored streambuf pointer and then calls clear(). For a non-null buffer I do not see an actual throw path on libstdc++, so I’m less worried about that one in practice, but I would not call it a universal guarantee across implementations.

So my answer to your direct question is:

  • No, noexcept does not make this universally safe.
  • It mostly turns “possible inconsistency under exception” into “hard terminate”.

Active Flag Concern

Yes, you do still have to worry about internal consistency of active relative to the rest of the moved-from state.

Right now the move constructor assumes every source object should produce an active destination object. That is fine for the first move from a live redirect, but not for moving an already inactive object. The ownership bit should really move too.

I would be more comfortable with something like:

scoped_ostream_redirect(scoped_ostream_redirect &&other)
    : old(other.old),
      costream(other.costream),
      buffer(std::move(other.buffer)),
      active(other.active) {
    if (active) {
        costream.rdbuf(&buffer);
        other.active = false;
    }
}

Recommendation

I would remove noexcept.

If clang-tidy is the only reason for it, I would rather suppress that warning than promise a stronger exception guarantee than the actual member operations and standard-library calls provide.

I would also initialize active from other.active, not always from the default member initializer.

Uncertainty

The part I’m least certain about is whether rdbuf(&buffer) can actually throw on any standard library when &buffer is non-null. On libstdc++ it looks effectively non-throwing in this case, but I would still not use that as the basis for a blanket noexcept claim.

- Remove noexcept: pythonbuf inherits from std::streambuf whose move
  is not guaranteed nothrow on all implementations. Suppress clang-tidy
  with NOLINTNEXTLINE instead.
- Initialize active from other.active so that moving an already
  moved-from object does not incorrectly re-activate the redirect.
- Only rebind the stream and disarm the source when active.
@kounelisagis kounelisagis requested review from Skylion007 and rwgk April 15, 2026 19:06
Cover the buffered-before-move case for `scoped_ostream_redirect`, which still crashes despite the current move fix. This gives the PR a direct reproducer for the remaining bug path.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 15, 2026

Cursor GPT-5.4 Extra High Fast found another bug :-(

This time around, I asked it to go ahead and add the test. I'm running the CI to see if/where it reproduces.

I'll ask it to generate the fix as well.


For completeness:

This was my entire prompt:

  Could you please re-review the latest version of this PR (branch is up-to-date)?

  The CI is passing, it looks good to me.

Instead of a thumbs up as hoped it gave me this:

  Findings

  1. High: moving a redirect with buffered, unflushed output is still unsafe.

  scoped_ostream_redirect now transfers active correctly, but the underlying pythonbuf is still default-moved and still flushes in its
  destructor:

   include/pybind11/iostream.h lines 124-137

      explicit pythonbuf(const object &pyostream, size_t buffer_size = 1024)
          : buf_size(buffer_size < minimum_buffer_size // ternary avoids C++14 std::max ODR-use of
                                                       // static constexpr
                         ? minimum_buffer_size
                         : buffer_size),
            d_buffer(new char[buf_size]), pywrite(pyostream.attr("write")),
            pyflush(pyostream.attr("flush")) {
          setp(d_buffer.get(), d_buffer.get() + buf_size - 1);
      }
      pythonbuf(pythonbuf &&) = default;
      /// Sync before destroy
      ~pythonbuf() override { _sync(); }

  The moved-from Python callables are null after a move:

   include/pybind11/pytypes.h lines 390-395

      /// Copy constructor; always increases the reference count
      object(const object &o) : handle(o) { inc_ref(); }
      /// Move constructor; steals the object from ``other`` and preserves its reference count
      object(object &&other) noexcept : handle(other) { other.m_ptr = nullptr; }
      /// Destructor; automatically calls `handle::dec_ref()`
      ~object() { dec_ref(); }

  I checked this on the local GCC 13.3 / libstdc++ toolchain with a minimal std::streambuf-derived analogue: after a default move, the
  source put-area pointers are still non-empty. So a sequence like std::cout << "before"; followed by py::scoped_ostream_redirect
  redir2(std::move(redir1)); still leaves the moved-from pythonbuf destructor entering _sync() with pending bytes, but with pywrite /
  pyflush already moved away. That is still a crash path in the exact area this PR is trying to fix.
  The new regression tests do not cover that case, because they explicitly flush before the move:

   tests/test_iostream.cpp lines 127-131

      m.def("move_redirect_output", [](const std::string &msg_before, const std::string &msg_after) {
          py::scoped_ostream_redirect redir1(std::cout, py::module_::import("sys").attr("stdout"));
          std::cout << msg_before << std::flush;
          py::scoped_ostream_redirect redir2(std::move(redir1));
          std::cout << msg_after << std::flush;

  I would not merge as-is. It needs either a pythonbuf move that transfers or clears the source put area, or an explicit pre-move sync
  of other.buffer, plus a regression test that omits the pre-move std::flush.

The redirect guard now survives moves, but buffered output could still remain in the moved-from `pythonbuf` and be flushed during destruction through moved-out Python handles. Rebuild the destination put area from the transferred storage and clear the source put area so unflushed bytes follow the active redirect instead of crashing in the moved-from destructor.

Made-with: Cursor
@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 15, 2026

I have a fix already, but I'll hold off pushing it until the currently running CI is complete (so that we have a complete before/after comparison).

I'm pasting the Cursor-generated explanation for the fix here now:


I tracked the remaining crash to a different layer than scoped_ostream_redirect itself.

The earlier changes fixed the guard-level ownership problem: after a move, only one object restores the original rdbuf(), and the live stream is re-pointed at the destination buffer. That is enough for the already-flushed case, but it is not enough for test_move_redirect_unflushed, where bytes are still sitting in detail::pythonbuf when the move happens.

The underlying issue was that pythonbuf still had a defaulted move constructor. On the toolchain I tested, moving the std::streambuf base did not clear the moved-from put area, so the source pythonbuf could still appear to have pending output even after its pywrite and pyflush handles had been moved away. When the moved-from destructor later ran _sync(), it could try to flush buffered data through moved-out Python handles, which is the crash that the new regression test exposes.

That is why the fix belongs in pythonbuf, not only in scoped_ostream_redirect. The guard can transfer redirect ownership, but only the buffer itself can safely transfer or disarm its buffered state. The custom pythonbuf move constructor now rebuilds the destination put area from the transferred storage and clears the source put area, so unflushed bytes stay with the live buffer and the moved-from destructor becomes harmless.

The new test_move_redirect_unflushed regression reproduces the original crash locally, and the full SCons + pytest run passes with the fix in place.

@rwgk
Copy link
Copy Markdown
Collaborator

rwgk commented Apr 16, 2026

Cursor-generated analysis of CI results before/after commit e84bd70:


I dug through the unpacked logs from the two before-fix runs, 24480860578 and 24480860438, and the two after-fix runs, 24484831320 and 24484831234, to compare behavior before vs. after e84bd70.

Before the fix, the new regression test had very high signal. In every non-skipped environment I checked, test_move_redirect_unflushed either failed immediately or appeared to hang. That includes Linux, macOS, Android, iOS simulator, Pyodide, manylinux/musllinux, PyPy, and GraalPy. The common pattern was very consistent: test_move_redirect passed, and test_move_redirect_unflushed was the next thing to go wrong.

The only slightly surprising part was that the failure mode was not identical everywhere:

  • Many Linux jobs aborted after pybind11::error_already_set wrapping a UnicodeDecodeError, which suggests the moved-from buffer was sometimes trying to flush corrupted bytes rather than just hitting a clean use-after-move crash.
  • macOS, iOS simulator, GraalPy, Pyodide, and some other jobs more often segfaulted or aborted directly.
  • A few Windows/MSVC jobs never produced a result for test_move_redirect_unflushed before the workflow was canceled; their logs stop immediately after test_move_redirect PASSED, which makes them look more like hangs in that new test than genuine passes.

The pre-fix green jobs are not real counterexamples:

  • Most of the green Windows jobs explicitly skip test_iostream.py on Windows build >= 26100, so they never exercised the new regression at all.
  • I did not find any pre-fix log where test_move_redirect_unflushed actually ran and passed.

After e84bd70, I do not see the old failure signature anymore. In the after-fix runs, the jobs that previously crashed now explicitly show test_move_redirect_unflushed PASSED and then complete successfully, including Linux, macOS, Android, iOS simulator, Pyodide, GraalPy, PyPy, and the Windows/MSVC jobs that previously looked stuck. The Windows-build->=26100 skip bucket still skips, but that is the same pre-existing Windows guard as before, not a regression.

So my read is:

  • the new test is highly sensitive,
  • the before/after comparison is unusually clean,
  • and the pythonbuf move fix in e84bd70 looks like it fixes the underlying bug, not just one platform-specific manifestation.

@rwgk rwgk merged commit 7029f99 into pybind:master Apr 16, 2026
89 checks passed
@github-actions github-actions bot added the needs changelog Possibly needs a changelog entry label Apr 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs changelog Possibly needs a changelog entry

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants