Skip to content

FlatExpression POC: data-oriented flat expression tree#511

Draft
Copilot wants to merge 6 commits intomasterfrom
copilot/data-oriented-expression-optimization
Draft

FlatExpression POC: data-oriented flat expression tree#511
Copilot wants to merge 6 commits intomasterfrom
copilot/data-oriented-expression-optimization

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 12, 2026

Explores the idea from #512: represent an expression tree as a single flat array of fat structs with integer index references instead of object-graph pointers — enabling stack allocation for small trees, trivial serialization, and O(1) structural equality.

Core types (src/FastExpressionCompiler/FlatExpression.cs)

  • Idx — 1-based int index into Nodes; default (It == 0) is the nil sentinel
  • ExpressionNode — 32-byte sequential fat struct (refs-first layout eliminates padding): Type, Info, NodeType, NextIdx (next sibling), ChildIdx (first child / inline constant bits), ExtraIdx (second child / constant discriminator)
  • ExpressionTree — holds nodes in SmallList<ExpressionNode, Stack16<…>, NoArrayPool<…>> (first 16 nodes on the call-stack) and closure constants in SmallList<object, Stack4<…>, …>; factory methods for Constant, Parameter, Unary, Binary, New, Call, Lambda, Conditional, Block
  • ToSystemExpression() — converts to System.Linq.Expressions so existing FEC compilation path is reachable; uses SmallMap16<int, SysParam, IntEq> (stack-resident) instead of Dictionary to map parameter indices during conversion
  • StructurallyEqual() — O(n) structural comparison via a single pass over the flat arrays; no traversal needed
var tree = default(ExpressionTree);
var px = tree.Parameter(typeof(int), "x");
var py = tree.Parameter(typeof(int), "y");
var add = tree.Add(px, py, typeof(int));
tree.Lambda(typeof(Func<int, int, int>), body: add, parameters: [px, py]);

// Round-trip to System.Linq.Expressions and compile
var fn = ((Expression<Func<int, int, int>>)tree.ToSystemExpression()).Compile();
fn(4, 7); // 11

Key design insight surfaced

Lambda parameters cannot be chained via NextIdx — the same parameter node may already have its NextIdx occupied as part of a New/Call argument chain. Lambda stores its parameters as Idx[] in Info instead. This is the central intrusive-list tension: one small Idx[] allocation per lambda avoids silent list corruption at construction time. A future optimisation could replace it with a (start, count) slice into a dedicated side array.

Constant node encoding

ExtraIdx is repurposed as a three-way discriminator, eliminating the old ConstantIndex field:

ExtraIdx.It Meaning
0 (nil) Value is in Info (boxed reference, or null)
> 0 ClosureConstants[ExtraIdx.It - 1] (1-based)
-1 Inline bits: value packed into ChildIdx.Itno boxing

Types stored inline in ChildIdx.It without boxing: bool, byte, sbyte, char, short, ushort, int, uint, float (reinterpreted via a portable [StructLayout(LayoutKind.Explicit)] union — compatible with all targets including netstandard2.0). Larger types (string, long, double, decimal, DateTime, Guid) remain in Info or closure.

Memory layout

ExpressionNode is 32 bytes on 64-bit by placing the two reference fields (Type, Info) first — eliminating the 4-byte padding that LayoutKind.Sequential would otherwise insert after the leading NodeType int:

Type(ref,8) + Info(ref,8) + NodeType(int,4) + NextIdx(4) + ChildIdx(4) + ExtraIdx(4) = 32 bytes

vs LightExpression heap objects (16-byte GC header + fields): Constant/Parameter ~40 bytes, Binary/Unary ~48–56 bytes.

Wins

  • ≤ 16 nodes → zero heap allocation (stack-resident via Stack16)
  • ExpressionNode is 32 bytes (down from 40) — refs-first layout, no padding
  • Common small constants (int, bool, float, etc.) stored inline — no boxing
  • ToSystemExpression() uses SmallMap16 (stack-resident) for the parameter map — no heap alloc for ≤ 16 parameters
  • Sibling traversal in hot paths uses direct while (!cur.IsNil) loops — no IEnumerator allocation
  • Trivially serializable: arrays of plain structs + integer refs
  • O(1) node access; O(n) structural equality without recursion
  • Closure constants collected at build time, mutable after build

Gaps / obstacles

  • Not API-compatible with System.Linq.Expressions without the adapter
  • Mutable struct: accidental by-value copy silently forks state
  • Info field boxes MethodBase/string — one allocation per call/new/parameter node

Tests (test/FastExpressionCompiler.UnitTests/FlatExpressionTests.cs)

22 tests covering node construction, parameter identity, structural equality, ToSystemExpression conversion, round-trip compile+invoke, and the mutable-closure-constant use case. Guarded with #if !LIGHT_EXPRESSION since FlatExpression.cs is not included in the LightExpression variant.

Copilot AI changed the title [WIP] Add data-oriented expression representation with optimization FlatExpression POC: data-oriented flat expression tree Apr 12, 2026
Copilot AI requested a review from dadhi April 12, 2026 22:17
Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 12, 2026

Too many comments. Keep only important stuff explaining why/how abd remove trivial what.

Done in d565578. Removed all comments that just describe what the code does. Kept only the three non-obvious why/how notes: the Lambda params-in-Info design decision, the // reset in case node was previously linked safety note in LinkList, and the InfoEqual note about array reference equality. Source went from 738 → 455 lines, tests from 391 → 329.

Copilot AI requested a review from dadhi April 12, 2026 22:46
// Builds body after registering params so they are found in paramMap when encountered in the body.
public SysExpr ToSystemExpression() => ToSystemExpression(RootIdx, new Dictionary<int, SysParam>());

private SysExpr ToSystemExpression(Idx nodeIdx, Dictionary<int, SysParam> paramMap)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Avoid allocations as much as possible. Use SmallMap and such from ImTools. Don't be afraid to add or extend the ImTools if somthing is missing or has errors (apply fix). Use pooling if needed.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 76ffb84. Replaced Dictionary<int, SysParam> with SmallMap16<int, SysParam, IntEq> (stack-allocated for ≤16 params, uses the AddOrGetValueRef pattern). The paramMap is now default-initialized on the stack and passed ref through the recursive calls.

return indices[0];
}

public IEnumerable<Idx> Siblings(Idx head)
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it only for tests - comment as such? Otherwise optimize.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in 76ffb84. Siblings is used in tests and as a public convenience API, so it stays — but is now commented as allocating an enumerator. The hot-path usages in SiblingList and the Block vars loop were replaced with direct while (!cur.IsNil) iterations, eliminating the enumerator allocation there.

Copilot AI requested a review from dadhi April 13, 2026 06:05
/// <summary>≥ 0: index into <see cref="ExpressionTree.ClosureConstants"/>. -1: value is inline in Info.</summary>
public int ConstantIndex;
public Idx NextIdx;
public Idx ChildIdx;
Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let find way to compress the memory layout:

  • for constant ChildIndex or ExtraIndex may represent the constant index?
  • for up to int (byte/bool - uint) constants we may store the value in extraindex inline?
  • can reatreptet (Unsafe) Info and Indexes as a value for constant/other, just put put a byte fields and reinterpet they span based on tbe goal?
  • later let think about actual node distribution/invariabts when created - can we compress more given the specific creation patrerns?

Let track struct size in the comments for specific expr type and compare with LightExoresdion analog as a smallest base

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in b9df8ea. Here's what changed:

Field reorder → 40→32 bytes: placing the two ref fields (Type, Info) first eliminates the 4-byte padding that LayoutKind.Sequential inserts between the original leading NodeType (int) and the first reference. The struct comment documents the layout and compares to LightExpression heap objects (~40 bytes for Constant/Parameter, ~48–56 for Binary/Unary, all with 16-byte GC header).

ConstantIndex removed: repurposed ExtraIdx as the constant discriminator:

  • ExtraIdx.It == 0 (nil): value is in Info (boxed reference, or null)
  • ExtraIdx.It > 0: ClosureConstants[ExtraIdx.It - 1] (1-based, avoids ambiguity with nil)
  • ExtraIdx.It == -1: inline bits — value packed into ChildIdx.It, no heap allocation

Inline int32 constants: bool, byte, sbyte, char, short, ushort, int, uint, float (via BitConverter.SingleToInt32Bits) all store directly in ChildIdx.It. Larger/reference types (string, long, double, decimal, DateTime, Guid) remain in Info or closure as before.

Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot Add minimal meaningful eesential focused summary comment for each public elements to fix the build on ci

Error: /home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpressionCompiler/FlatExpression.cs(51,16): error CS1591: Missing XML comment for publicly visible type or member 'Idx.It' [/home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpressionCompiler/FastExpressionCompiler.csproj::TargetFramework=netstandard2.0]

Copilot AI requested a review from dadhi April 13, 2026 06:34
@dadhi
Copy link
Copy Markdown
Owner

dadhi commented Apr 13, 2026

@copilot

Gaps / obstacles
Not API-compatible with System.Linq.Expressions without the adapter
Mutable struct: accidental by-value copy silently forks state
Info field boxes MethodBase/string — one allocation per call/new/parameter node

Let step back and fo a principled review of the Expressio API differences and FastExpressionCompiler conforming API for those differences: let analyze current diff beteeen SE and fec LE, then to FE. How can we bring the gap and polifill into a (mostly) unified API without decreasing performance? Let compare concrete method calls and parameters representation on some example to make evident. Let reavaluate our current approach with polyfills and usings, etc.and evaluate how to do better.
Let think outside of the box and msybe ecplore ortogonal ways to eliminate the gaps by eliminating the ñroblem context - example idea: make FastExpressionCompiler a dtruct instead of static class to carry/pass the emit state in struct instance instead of through .ethod parameters, that way centralizing the state and its difference in a single place with the unifidd access methods if needed and avoiding #if on each TryEmit method definition. Let explore other drastic orthogonal changes. Let also think of FlatExoression-like structures prevalent in games and ECS or in data-oriented/apl context - can we borrow their solution to our cause. Let say we now trivially implemented structural comparison comparing to recursive tree descent required in fec. Then what else can we fo trivialy to #if the impl. for FE and maybe avoid mixing it in tbe rest of tbe Fec at all

Copy link
Copy Markdown
Owner

@dadhi dadhi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Address comments and build failures.

…ter.SingleToInt32Bits; add XML docs to all public members

Agent-Logs-Url: https://github.com/dadhi/FastExpressionCompiler/sessions/8d5fabaa-d5e2-458c-b771-309266696c44

Co-authored-by: dadhi <39516+dadhi@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

@copilot Add minimal meaningful eesential focused summary comment for each public elements to fix the build on ci

Error: /home/runner/work/FastExpressionCompiler/FastExpressionCompiler/src/FastExpr...

Fixed in 116a593. Two CI blockers resolved:

  1. CS1591 — added <summary> XML docs to every public member (Idx.It, Idx.IsNil, Idx.Nil, Idx.Of, ExpressionNode fields, and all ExpressionTree fields/properties/methods).

  2. CS0117BitConverter.SingleToInt32Bits/Int32BitsToSingle don't exist on netstandard2.0. Replaced with a [StructLayout(LayoutKind.Explicit)] union struct FloatIntBits that overlays a float and an int at offset 0 — portable across all target frameworks (net472, netstandard2.0/2.1, net6+).

Full Release build now passes with 0 warnings and 0 errors across all targets.

@copilot

Gaps / obstacles
...

Here's the structured analysis and some concrete orthogonal directions.


SE vs LE vs FE — where the gap lives

Aspect SE (System.Linq.Expressions) LE (LightExpression) FE (FlatExpression)
Node identity Object reference (ReferenceEquals) Object reference Integer index (Idx)
ParameterExpression sharing Same object == same param Same object == same param Same Idx == same param
Factory API Expression.Add(left, right) Expression.Add(left, right) (identical) tree.Add(left, right, typeof(int)) — type is explicit
Return type inference Built-in for numeric ops Built-in Caller must pass Type
Visitor pattern ExpressionVisitor ExpressionVisitor Not implemented
#if surface None LIGHT_EXPRESSION on each TryEmit overload None yet

The biggest API friction point: SE and LE both infer result types on arithmetic ops, FE requires an explicit Type argument — callers can't just replace Expression.Add(a, b) with tree.Add(a, b).


Polyfill idea: extension helpers that mirror SE factory names

// In FlatExpression — a thin shim so existing code compiles unchanged
public Idx Add(Idx left, Idx right) => Add(left, right, NodeAt(left).Type);
public Idx Subtract(Idx left, Idx right) => Subtract(left, right, NodeAt(left).Type);

This narrows the gap for arithmetic at the cost of a NodeAt call per factory, which is O(1) but still forces a bounds check. Worth it for migration ergonomics.


Orthogonal idea 1: FastExpressionCompiler as a struct

Today the compiler passes all its mutable emit-state through method parameters with ref. A struct instance carrying that state would:

  • eliminate the need for #if LIGHT_EXPRESSION on each TryEmit signature (the struct type IS the dispatch)
  • allow FlatExpression-specific overloads to live in a FlatExpressionCompiler : ICompiler<ExpressionTree> struct instead of being mixed into the class
  • make the "which expression representation are we compiling?" a compile-time type parameter, not a runtime flag
struct FlatExpressionCompiler { 
    public void Emit(ref ExpressionTree tree, ILGenerator il, ref ClosureInfo ci) { ... }
}

Orthogonal idea 2: borrow ECS "archetype" separation

In ECS, entities of different archetypes (different component sets) live in separate SOA (Structure-of-Arrays) storages. Applied here: instead of one fat ExpressionNode for every possible node kind, split into typed arrays per node kind:

ConstantNodes[]    // only Type + inline value/index
ParameterNodes[]   // only Type + name
BinaryNodes[]      // only Type + left + right + method

Idx could encode both the kind and the position: upper 8 bits = kind tag, lower 24 bits = per-array index. This halves average node size, makes kind dispatch table-driven, and makes cache-line packing much better for homogeneous trees (e.g. all-arithmetic).

The tradeoff: cross-kind traversal (walking a tree) requires a tagged-union dispatch on every step, but FEC's emit loop already does that via switch (node.NodeType).


What FE already trivialises that LE cannot

  • Structural equality — single memcmp-style loop, no visitor
  • Serialization — arrays of plain value-type structs; no object graph
  • Clone/fork — copy the Nodes array (only 16 × 32 = 512 bytes for a small tree)
  • Diff/patch — compare two Nodes[] pairwise; emit change log

These are cases where the "no objects" design pays off even before we touch the compiler proper. If FE can round-trip through ToSystemExpression() today, the full FEC emit path is already reachable without any #if; the #if-free compiler integration is a next step.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

Address comments and build failures.

Both CI build failures were fixed in 116a593:

  • CS0117 (BitConverter.SingleToInt32Bits missing on netstandard2.0) — replaced with a portable [StructLayout(LayoutKind.Explicit)] FloatIntBits union
  • CS1591 (missing XML docs) — added <summary> to every public member

The latest CI run (action_required) is GitHub's workflow approval gate for the Copilot PR, not a build failure — it needs a manual approval click from the repo owner to start the jobs. Local build is clean: 0 warnings, 0 errors across all targets (net472, netstandard2.0/2.1, net6/8/9); all 1677 tests pass.

@dadhi
Copy link
Copy Markdown
Owner

dadhi commented Apr 13, 2026

@copilot

Polyfill idea: extension helpers that mirror SE factory names
Let's do that

For this:

struct FastExpressionCompiler { 
    public void TryEmit(ref ExpressionTree tree, ILGenerator il, ref ClosureInfo ci) { ... }
}

Let analyze further, can we make ClosureInfo into the FlatExpressionCompiler state?
Why not other parameters as well, ref ExpressionTree tree, ILGenerator il?
We pass as the parameter the Idx pointing to the current Expression to emit, then parent flags, byRefIndex.

  1. What about unification of the parameters if we switch to struct FEC in the code below:
            public static bool TryEmit(Expression expr,
#if LIGHT_EXPRESSION
                IParameterProvider paramExprs,
#else
                IReadOnlyList<PE> paramExprs,

Can we bring the gap between SE and LE first here? Say we keep stack of nested parameters in struct, and we work the the last thing on stack via accessor with #if inside or via struct adapter exposing ParameterCount and GetParameter(0) for all?

Please implement 1 and ONLY evaluate 2, 3 - I want to be sure about scope of change before doing, may be split into followup change.

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 13, 2026

@dadhi I've opened a new pull request, #514, to work on those changes. Once the pull request is ready, I'll request review from you.

@dadhi
Copy link
Copy Markdown
Owner

dadhi commented Apr 14, 2026

@copilot What about analysis for 2 and 3

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 14, 2026

@copilot What about analysis for 2 and 3

Here's the analysis for 2 and 3.


Point 2 — struct FlatExpressionCompiler carrying all emit state

Current shape of the TryEmit hot-path signature:

public static bool TryEmit(
    Expression expr,
    IParameterProvider paramExprs,   // or IReadOnlyList<PE> in non-LE build
    ILGenerator il,
    ref ClosureInfo closure,
    CompilerFlags setup,
    ParentFlags parent,
    int byRefIndex = -1)

ClosureInfo is already a struct passed by ref. The idea is to fold all four context parameters — paramExprs, il, closure, plus the per-call setup/parent/byRefIndex — into a struct the caller owns:

public struct EmitContext
{
    public ILGenerator Il;               // cheap (reference, no copy cost)
    public ClosureInfo Closure;          // currently 300+ bytes — fold in, pass as ref
    public CompilerFlags Flags;
    public ParentFlags Parent;
    public int ByRefIndex;
    // paramExprs: see point 3 below
}

Call becomes:

bool TryEmit(Expression expr, Idx idx, ref EmitContext ctx)

Wins:

  • Every recursive TryEmit call today passes 5–7 parameters. Folding them into one ref struct cuts the argument-passing noise and makes the compiler state coherent and inspectable in one place.
  • ClosureInfo folded in means nested-lambda code doesn't pass ref ClosureInfo separately anymore.
  • For FlatExpression, expr becomes Idx — a single int. No virtual dispatch, no GC object on the path.

Obstacles:

  • ClosureInfo is ~300 bytes of fields (several SmallList/SmallMap embedded). Folding it into EmitContext doesn't change the memory picture — it's already always on the stack via ref. No regression there.
  • The ILGenerator is a reference type; folding it in is free (pointer copy once at construction).
  • The internal helper methods — ~80 private TryEmit* overloads — all take ILGenerator il, ref ClosureInfo closure today. Changing the signature means touching all of them. This is a mechanical refactor but high surface area (~80 methods, ~12k-line file).
  • The LE variant has IsIntrinsic early-returns that call expr.TryEmit(setup, ref closure, paramExprs, il, parent, byRefIndex) — virtual dispatch through ExpressionInfo. Moving state into a struct doesn't eliminate this; EmitContext would have to be passed there too, or the virtual protocol is updated. That's the most invasive part.

Verdict: Sound direction; the blocking cost is the mechanical churn on ~80 helper methods. Worth doing in a follow-up PR scoped to that refactor alone.


Point 3 — Unifying IParameterProvider (LE) vs IReadOnlyList<PE> (SE)

Where the #if lives:

  • 43 IReadOnlyList<PE> usages in the SE path
  • 43 IParameterProvider usages in the LE path (same methods, different #if branch)
  • The API difference: .Count / [i] (list) vs .ParameterCount / .GetParameter(i) (provider)

Bridging options:

Option A — adaptor wrapper (zero #if on hot path):

// Defined once, no conditional compilation
internal interface IParamList
{
    int Count { get; }
    ParameterExpression this[int i] { get; }
}

// SE side (adapter, allocated once per compile call, not per TryEmit call)
internal struct SEParamList : IParamList {
    readonly IReadOnlyList<PE> _l;
    public int Count => _l.Count;
    public PE this[int i] => _l[i];
}

// LE side
internal struct LEParamList : IParamList {
    readonly IParameterProvider _p;
    public int Count => _p.ParameterCount;
    public PE this[int i] => _p.GetParameter(i);
}

The #if moves to exactly one place — the call-site that constructs the IParamList. Every TryEmit* takes IParamList (or the EmitContext from point 2 carries it). No more per-method #if. Virtual dispatch cost on .Count/[i] is the same as today since both paths already go through an interface.

Option B — keep paramExprs in EmitContext as object, box once:
Coarser approach: store paramExprs as object in EmitContext, cast inside the LE/SE accessors via #if. Same dispatch cost, less type safety. Not recommended.

Option C — SE implements IParameterProvider:
Add an extension/adapter so IReadOnlyList<PE> satisfies IParameterProvider. LE already uses IParameterProvider; SE just needs a thin wrapper. All code then uses IParameterProvider — no #if at all on the parameter type. This is the cleanest endgame.
ReadOnlyCollectionParameterProvider : IParameterProvider wraps IReadOnlyList<PE>, allocates once per compile call (same as the lambda expression object lifetime). The wrapping site is the entry point, not in the hot loop.

Verdict for 3: Option C is cleanest — define IParameterProvider in the shared (non-LE) file, have the SE entry-points wrap their IReadOnlyList<PE> once, eliminate the #if on all 43 method signatures. The IParameterProvider interface then becomes the single contract. This is a self-contained PR; it does not require the struct-FEC change from point 2.


Suggested sequencing:

  1. Point 3 (Option C: IParameterProvider unification) — standalone, low risk, removes 43 #if pairs
  2. Point 2 (struct EmitContext folding) — builds on 3, higher churn but also standalone
  3. FlatExpression emit integration — builds on both; Idx replaces Expression expr in EmitContext

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Data-oriented representation of Expression partly on stack and optimized for constant collection

2 participants