Skip to content

[WIP] [core] Introduce BLOB_REF for shared blob data#7602

Open
leaves12138 wants to merge 8 commits intoapache:masterfrom
leaves12138:ai_blob_ref
Open

[WIP] [core] Introduce BLOB_REF for shared blob data#7602
leaves12138 wants to merge 8 commits intoapache:masterfrom
leaves12138:ai_blob_ref

Conversation

@leaves12138
Copy link
Copy Markdown
Contributor

@leaves12138 leaves12138 commented Apr 7, 2026

Purpose

This PR introduces BLOB_REF for sharing blob data across tables without duplicating payloads in Paimon-managed storage.

Changes

  • add the BLOB_REF type and wire it through API, format, Arrow, Flink, Spark and Hive type conversions
  • serialize BLOB_REF values as BlobReference metadata instead of inline blob payloads
  • resolve blob references lazily on read, preferring direct URI reads and falling back to metadata lookup by table/row/field
  • keep the fallback path streaming instead of buffering the whole blob into memory
  • add fieldId to blob references for better schema evolution compatibility during fallback lookup
  • avoid dereferencing blob payloads in InternalRowToSizeVisitor
  • explicitly reject nested BLOB_REF in schema validation, since read-time resolution currently only supports top-level BLOB_REF
  • add unit tests for blob reference serialization, fallback streaming, size estimation, schema validation and fallback lookup

Testing

Passed:

  • mvn -pl paimon-common -am -DfailIfNoTests=false -Dcheckstyle.skip -Dspotless.check.skip -Denforcer.skip -Dtest=BlobReferenceTest,BlobReferenceBlobTest,InternalRowToSizeVisitorTest test

Copy link
Copy Markdown
Contributor Author

@leaves12138 leaves12138 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found a few runtime gaps where BLOB_REF support is still incomplete.

@leaves12138 leaves12138 changed the title [core] Introduce BLOB_REF for shared blob data [WIP] [core] Introduce BLOB_REF for shared blob data Apr 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant