Skip RowFilter and page pruning for fully matched row groups#21637
Skip RowFilter and page pruning for fully matched row groups#21637xudong963 wants to merge 6 commits intoapache:mainfrom
Conversation
54a4166 to
5da11ea
Compare
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (5da11ea) to dc973cc (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (5da11ea) to dc973cc (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (5da11ea) to dc973cc (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (7cff519) to dc973cc (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (7cff519) to dc973cc (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (7cff519) to dc973cc (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmarks tpcds tpch |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (7cff519) to dc973cc (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (7cff519) to dc973cc (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
When row group statistics prove that ALL rows satisfy the filter predicate, skip both RowFilter evaluation (late materialization) and page index pruning for those row groups. This avoids wasted work decoding filter columns and evaluating predicates that produce no useful filtering. Depends on apache/arrow-rs#9694 for the `with_fully_matched_row_groups()` builder API. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
When a row group has NULL values in predicate columns, those rows evaluate to NULL (not true) in the filter. The inverted predicate approach can incorrectly mark such row groups as "fully matched" because NULLs satisfy neither the predicate nor its inverse. Check null_count statistics for predicate columns before marking a row group as fully matched. If any predicate column has NULLs, the row group is not fully matched and the filter must still run. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ed row groups The fully matched row group optimization skips page index pruning, reducing the page_index_pages_pruned count from 6 to 4 (the 2 pages in the fully matched row group are no longer evaluated). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fully matched row groups now skip page index pruning, reducing the page_index_pages_pruned counts in limit_pruning.slt and dynamic_filter_pushdown_config.slt. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…-rs changes) Instead of adding a `with_fully_matched_row_groups` API to arrow-rs, implement the optimization entirely in DataFusion by creating separate ParquetPushDecoders for row groups that need filtering vs those that are fully matched. Key changes: - Split row groups into consecutive runs of same filter requirement via `split_decoder_runs()`, preserving original row group ordering for ordered scans. - Each filtered run gets its own RowFilter; fully-matched runs skip it. - Use VecDeque<ParquetPushDecoder> in PushDecoderStreamState to chain decoders sequentially. - Remove [patch.crates-io] arrow-rs fork dependency. This aligns with the direction of per-row-group morsels: each decoder run can naturally become a morsel when that infrastructure lands. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
f0e02e9 to
d6c3879
Compare
|
run benchmarks |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (d6c3879) to a0dbbab (merge-base) diff using: clickbench_partitioned File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (d6c3879) to a0dbbab (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (d6c3879) to a0dbbab (merge-base) diff using: tpch File an issue against this benchmark runner |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usageclickbench_partitioned — base (merge-base)
clickbench_partitioned — branch
File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
|
run benchmarks tpcds tpch |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (0b1f441) to a0dbbab (merge-base) diff using: tpch File an issue against this benchmark runner |
|
🤖 Benchmark running (GKE) | trigger CPU Details (lscpu)Comparing datafusion/issue-19028-benchmark (0b1f441) to a0dbbab (merge-base) diff using: tpcds File an issue against this benchmark runner |
|
🤖 Benchmark completed (GKE) | trigger Instance: CPU Details (lscpu)Details
Resource Usagetpcds — base (merge-base)
tpcds — branch
File an issue against this benchmark runner |
Which issue does this PR close?
Rationale for this change
When DataFusion evaluates a Parquet scan with filter pushdown, it uses row group statistics to determine which row groups contain matching rows. The
RowGroupAccessPlanFilteralready tracks which row groups are "fully matched" — where statistics prove that all rows satisfy the predicate (viais_fully_matched).However, this information was not propagated downstream. Even for fully matched row groups:
This is especially costly when filter columns are expensive to decode (e.g., large strings) or when predicates are complex. Common real-world examples include time-range filters where entire row groups fall within the range, or
WHERE status != 'DELETED'on data with no deleted rows.What changes are included in this PR?
DataFusion changes (this PR)
row_group_filter.rs:RowGroupAccessPlanFilter::build()now returns(ParquetAccessPlan, Vec<usize>)— the access plan plus the indices of fully matched row groups.page_filter.rs:prune_plan_with_page_index()accepts afully_matched_row_groupsparameter and skips page-level pruning for those row groups.opener.rs: Wires fully matched row groups through the pipeline — passes them to page pruning and to theParquetPushDecoderBuilderviawith_fully_matched_row_groups().Arrow-rs dependency (apache/arrow-rs#9694)
The new
ArrowReaderBuilder::with_fully_matched_row_groups()API in arrow-rs allows skippingRowFilterevaluation during Parquet decoding for specified row groups. This PR uses[patch.crates-io]pointing to the arrow-rs fork branch until that PR is merged and released.Benchmark
Includes a criterion benchmark (
parquet_fully_matched_filter) usingParquetPushDecoderdirectly — the same code path DataFusion's async opener uses. Dataset: 20 row groups × 50K rows, with a 1KB string payload column and predicatex < 200(all row groups fully matched).Are these changes tested?
datafusion-datasource-parquettests pass (16 failures are pre-existing, caused by missingparquet-testingsubmodule)Are there any user-facing changes?
No user-facing API changes. This is a transparent performance optimization — queries that previously worked will now be faster when row group statistics prove all rows match the predicate.
Note: This PR depends on apache/arrow-rs#9694. Theall logic is on df side now[patch.crates-io]inCargo.tomlwill be removed once that arrow-rs change is released.