docs: clarify limitations of weighted-average embedding for long inputs by alobroke · Pull Request #2570 · openai/openai-cookbook

alobroke · 2026-03-31T18:57:58Z

Summary

Addresses #2549 — clarifies the mathematical limitations of the
weighted-average approach for long-input embeddings.

Problem

OpenAI embedding models return unit-normalized vectors (L2 norm = 1).
This means the original embedding magnitude is discarded before the
user receives it. The notebook previously implied that weighting chunks
by token count produces a sound representation of the full text — but
this is mathematically a heuristic, not a reconstruction.

Changes

Added ⚠️ warning callout explaining the unit-normalization issue
Updated len_safe_get_embedding docstring with explicit caveats
Updated truncate_text_tokens docstring recommending it as the
preferred approach for classification tasks
Added use-case comparison table at the end of the notebook

References

Fixes #2549

…ts OpenAI embeddings are unit-normalized (L2 norm = 1), so weighting chunks by token count does not recover original embedding magnitudes. Added explicit warning callout, updated docstrings for both functions, and added a use-case comparison table recommending truncation for classification tasks. Fixes openai#2549

alobroke added 2 commits April 1, 2026 00:24

Merge branch 'main' into fix/embedding-long-inputs-clarify-averaging

0a60e5a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify limitations of weighted-average embedding for long inputs#2570

docs: clarify limitations of weighted-average embedding for long inputs#2570
alobroke wants to merge 2 commits intoopenai:mainfrom
alobroke:fix/embedding-long-inputs-clarify-averaging

alobroke commented Mar 31, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

alobroke commented Mar 31, 2026

Summary

Problem

Changes

References

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant