Add Ulysses attention by csgoogle · Pull Request #376 · AI-Hypercomputer/maxdiffusion

csgoogle · 2026-04-13T17:48:57Z

Summary

This PR adds Ulysses attention support for WAN TPU inference in MaxDiffusion and documents how to enable it.

Design Doc: https://docs.google.com/document/d/1_hrPGaIwj84iF8vFJrcdKdmwfKJPvW6O2Sy5ftLVn60/edit?usp=sharing&resourcekey=0-p0zkvHa_NJDwHPqLwNxNCg

What Changed

added a TPU Ulysses attention path for WAN that performs sequence-to-head all_to_all before local splash attention and restores the original layout afterward
refactored the TPU flash/Ulysses block-size resolution logic so both paths use the same helper
added fail fast with a ValueError when the attention head count is not divisible by the context shard count
added tests
updated the README to document Ulysses support for WAN inference, including the required attention="ulysses" and ici_context_parallelism>1 override pattern

Performance

TPU v6e

Wan2.2 I2V

Setup:

model: Wan-AI/Wan2.2-I2V-A14B-Diffusers
hardware: 8x TPU v6 lite
parallelism: dp=2, cp=4, fsdp=1, tp=1
timing config: 40 inference steps, 81 frames, 720x1280

Global Batch Size	Flash	Ulysses	Delta
1	285.56s	251.45s	-11.9%
2	533.67s	491.22s	-8.0%

Wan2.2 T2V

Setup:

model: Wan-AI/Wan2.2-T2V-A14B-Diffusers
hardware: 8x TPU v6e
parallelism: dp=2, cp=4, fsdp=1, tp=1
timing config: 40 inference steps, 81 frames, 720x1280

Global Batch Size	Flash	Ulysses	Delta
1	275.54s	246.90s	-10.39%
2	535.40s	480.24s	-10.30%

TPU v7x

Wan2.2 I2V

Setup:

model: Wan-AI/Wan2.2-I2V-A14B-Diffusers
hardware: TPU v7-8 (8 chips)
parallelism: ici_context_parallelism=4, ici_data_parallelism=2
timing config: 40 inference steps, 81 frames, 720x1280
flash block sizes: block_q=2048, block_kv=2048, block_kv_compute=1024

Global Batch Size	Flash	Ulysses	Delta
1	209s	199s	-5%
2	414s	394s	-5%
4	829s	780s	-6%

github-actions · 2026-04-13T17:49:09Z

e2e testgrid: https://8bcf50593faf4ea38060e236169827e5-dot-us-central1.composer.googleusercontent.com/dags/maxdiffusion_tpu_e2e/grid

working code

7026d20

Remove benchmark scripts

bf5bc4e

csgoogle changed the title ~~working code~~ Add Ulysses attention Apr 15, 2026

csgoogle marked this pull request as ready for review April 15, 2026 09:18

csgoogle requested a review from entrpn as a code owner April 15, 2026 09:18

csgoogle added 2 commits April 15, 2026 10:27

Refine ulysses attention and tests

10451b0

Document ulysses attention for wan inference

25a1558

Perseus14 reviewed Apr 15, 2026

View reviewed changes

Comment thread src/maxdiffusion/models/attention_flax.py

entrpn approved these changes Apr 15, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Ulysses attention#376

Add Ulysses attention#376
csgoogle wants to merge 4 commits intomainfrom
ulysses-attention-benchmark

csgoogle commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

csgoogle commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What Changed

Performance

TPU v6e

Wan2.2 I2V

Wan2.2 T2V

TPU v7x

Wan2.2 I2V

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

csgoogle commented Apr 13, 2026 •

edited

Loading