Skip to content

perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618

Draft
EeshanBembi wants to merge 3 commits intoapache:mainfrom
EeshanBembi:main
Draft

perf: fast-path inline strings in ByteViewGroupValueBuilder::vectorized_append#21618
EeshanBembi wants to merge 3 commits intoapache:mainfrom
EeshanBembi:main

Conversation

@EeshanBembi
Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #21568.

Rationale for this change

ByteViewGroupValueBuilder::vectorized_append was doing unnecessary work for short strings (≤12 bytes): for each row it called array.value(row) to decode the u128 view into a &[u8], then called make_view to re-encode it back into a u128. The input GenericByteViewArray already stores inline values in exactly that u128 format, so the round-trip is redundant.

This mirrors the existing HAS_BUFFERS specialisation in vectorized_equal_to_inner, which uses the same data_buffers().is_empty() guard to take a direct-view-compare fast path for inline strings.

What changes are included in this PR?

In vectorized_append_inner, the Nulls::None branch now dispatches on arr.data_buffers().is_empty():

  • Fast path (no data buffers → all values ≤12 bytes inline): copies u128 views directly via self.views.extend(rows.iter().map(|&row| arr.views()[row])). Arrow's validity invariant guarantees inline views are zero-padded, so direct copy is semantically identical to value() → make_view().
  • Slow path (array has non-inline strings): adds self.views.reserve(rows.len()) before the existing loop to avoid repeated reallocation.

Are these changes tested?

Covered by the existing 6 unit tests in bytes_view::tests, all passing unchanged. test_byte_view_vectorized_operation_special_case exercises the fast path directly (11-byte strings, no data buffers).

Are there any user-facing changes?

No. Internal performance improvement only.

Benchmark

inline_null_0.0_size_1000/vectorized_append (8-byte strings, no nulls, 1 000 rows):

time
Before 3.37 µs
After 495 ns
Change −85.3% (6.8× faster)

ebembi-crdb and others added 3 commits April 7, 2026 18:33
…on types

Closes apache#21144

Implements DFExtensionType for all remaining canonical Arrow extension
types so they are recognized and pretty-printed by the extension type
registry:

- Bool8: displays Int8 values as 'true'/'false' instead of raw integers
- Json: uses default string formatter (values are already valid JSON)
- Opaque: uses default formatter
- FixedShapeTensor: uses default formatter, storage_type computed from
  value_type and list_size
- VariableShapeTensor: uses default formatter, storage_type computed
  from value_type and dimensions
- TimestampWithOffset: uses default formatter

All six types are registered in
MemoryExtensionTypeRegistry::new_with_canonical_extension_types()
alongside the existing UUID registration.
…ed_append

When the input StringView/BinaryView array has no data buffers (all values
≤12 bytes, stored inline), skip the value() → make_view() round-trip in
do_append_val_inner and instead copy the u128 views directly. Arrow
guarantees valid arrays have zero-padded inline views, so the direct copy
is semantically identical and lets the compiler vectorize the loop.

Also pre-reserve views capacity in the slow path (non-inline strings) to
avoid repeated Vec reallocation.

Closes apache#21568
@github-actions github-actions bot added logical-expr Logical plan and expressions core Core DataFusion crate common Related to common crate physical-plan Changes to the physical-plan crate labels Apr 14, 2026
@EeshanBembi EeshanBembi marked this pull request as draft April 16, 2026 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

common Related to common crate core Core DataFusion crate logical-expr Logical plan and expressions physical-plan Changes to the physical-plan crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize ByteViewGroupValueBuilder vectorized_append

1 participant