Skip to content

feat(materials): add binary-safe inline material support#3030

Merged
migmartri merged 5 commits intochainloop-dev:mainfrom
migmartri:worktree-enchanted-launching-newt
Apr 14, 2026
Merged

feat(materials): add binary-safe inline material support#3030
migmartri merged 5 commits intochainloop-dev:mainfrom
migmartri:worktree-enchanted-launching-newt

Conversation

@migmartri
Copy link
Copy Markdown
Member

@migmartri migmartri commented Apr 14, 2026

Problem

When binary data is stored inline (no CAS backend configured), wf run describe crashes with:

grpc: error while marshaling: string field contains invalid UTF-8

The root cause: binary file content flows through NormalizedMaterial.Value (Go string) and into AttestationItem.Material.value (proto string). Protobuf 3 requires string fields to be valid UTF-8, so gRPC rejects the response. The write path is correct — Artifact.content in the crafting state proto is bytes, and the in-toto ResourceDescriptor.content is also bytes. The corruption only happens on the read/response path when normalizeMaterial() casts []byte to string.

Approach

Add a new bytes raw_value field to the API response proto alongside the existing (now deprecated) string value. Internally, change NormalizedMaterial.Value from string to []byte so binary content flows cleanly through the pipeline. Dual-populate at the API boundary for backward compatibility.

Changes

  • Add bytes raw_value field to AttestationItem.Material proto, mark string value as deprecated
  • Change internal NormalizedMaterial.Value from string to []byte
  • Dual-populate at API boundary: raw_value always set, deprecated value set only for valid UTF-8
  • CLI prefers raw_value with fallback to value for backward compat with older servers
  • Plugin fanout proto keeps string value with UTF-8 guard — binary content available via Content bytes field

Testing

  • Binary round-trip test (TestBinaryContentStructRoundTrip) verifies data survives the structpb.Struct serialization path (json.Marshal -> protojson.Unmarshal -> protojson.Marshal -> json.Unmarshal) with null bytes, ELF headers, all 256 byte values, and 64KB payloads
  • Unit test for extractMaterials confirms binary content populates raw_value while value stays empty
  • Manual verification with real attestation: a 293KB Helm chart .tgz (gzip binary) round-trips intact via raw_value, while a 348KB CycloneDX JSON SBOM populates both fields correctly

Compatibility

Scenario Behavior
Old CLI + New Server Reads value (deprecated) — works for UTF-8 content, empty for binary (no crash)
New CLI + New Server Reads raw_value — works for all content
New CLI + Old Server Falls back to value — same as today
Existing attestations Read path fixed, no migration needed

Closes #2065

Add a new `bytes raw_value` field to the AttestationItem.Material proto
message to support binary content in inline materials. Previously,
binary data stored inline caused gRPC marshaling errors because
`string value` requires valid UTF-8.

The change flows through the internal NormalizedMaterial.Value type
(now []byte) and dual-populates at the API boundary: raw_value always
set, deprecated value set only when content is valid UTF-8.

Closes chainloop-dev#2065

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
Copy link
Copy Markdown

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No issues found across 16 files

The FIELD_SAME_TYPE breaking change rule is expected for the
NormalizedMaterial.value field change from string to bytes.
This is wire-compatible and the fanout proto is an internal
plugin SDK, not a public API.

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
@migmartri migmartri requested a review from a team April 14, 2026 14:27
// older control plane versions that don't populate raw_value.
var value string
if len(in.GetRawValue()) > 0 {
value = string(in.GetRawValue())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if RawValue doesn't contain UTF8 content? This line will panic. I think you'll need to validate here as well with utf8.Valid

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
Check UTF-8 validity before allocating string to avoid wasted allocation
for binary content. Add round-trip test verifying binary data survives
the structpb.Struct serialization path.

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
Revert fanout proto value field to string since plugins expect text.
Guard the assignment with UTF-8 validation so binary inline content
is only available via the Content bytes field.

Signed-off-by: Miguel Martinez Trivino <miguel@chainloop.dev>
@migmartri migmartri merged commit f76b9ac into chainloop-dev:main Apr 14, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error decoding attestations with INLINE binary data

2 participants