Summary
Depthwise backward-weights on AArch64 SVE-256 produces incorrect results for strided, padded cases (e.g., C=24, Kh=3, Sh=2, Ph=1). PyTorch test TestConvolutionNN.test_Conv2d_OneDNN fails, while benchdnn does not flag the issue. A regression (gtest) comparing a legacy blocked-oh path vs a new per-row path exposes the defect.
Fix merged in PR #4081.
cc. @Sqvid
Version
- oneDNN v3.9.1 (commit
80a3a8e745d2f0186e674b0af9332fd6e074c94f)
- Also reproduced with oneDNN v3.7.1
Environment
- CPU: AArch64 SVE (256-bit) (Neoverse V1)
- oneDNN runtime: OpenMP,
nthr=32
- PyTorch (arm/aarch64 build) using oneDNN backend
- Python 3.10
Steps to reproduce
1) PyTorch unit test (fails)
# ONEDD_VERBOSE=all to capture impl & commit
export ONEDD_VERBOSE=all
python pytorch/test/nn/test_convolution.py TestConvolutionNN.test_Conv2d_OneDNN
Typical verbose snippet at failure:
onednn_verbose,v1,info,oneDNN v3.9.1 (commit 80a3a8e...)
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,forward_training,...
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,backward_weights,...
g24mb1_ic24oc24_ih6oh3kh3sh2ph1_iw6ow3kw3sw2pw1
2) Detailed C++ gtest reproduction steps
Start from oneDNN v3.9.1 (commit 80a3a8e745d2f0186e674b0af9332fd6e074c94f) on AArch64 SVE-256 (Neoverse V1).
Prerequisites
- Replace
tests/gtests/test_convolution_backward_weights_dw_compare.cpp and src/cpu/aarch64/jit_uni_dw_convolution.cpp with the supplied versions (attachments)
- File:
tests/gtests/test_convolution_backward_weights_dw_compare.cpp (attachment)
- Compares legacy vs new AArch64 DW BWD_W (env-switchable):
ONEDNN_AARCH64_DW_BWDW_USE_OLD=1 → legacy path
- unset → new per-row path
- Descriptor used:
g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1
Build configuration
# Configure with tests enabled
cmake -S . -B build -DDNNL_BUILD_TESTS=ON
# Rebuild so both the kernel and gtest pick up changes
cmake --build build --target all -- -j$(nproc)
Run regression test
cd build && ctest -V -R test_convolution_backward_weights_dw_compare
Optional: benchdnn verification
ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --conv --dir=BWD_W --fast-ref=false g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1
Logs & diff evidence
- Each run writes
depthwise_bwdw_compare.log next to the binary (build/tests/gtests/depthwise_bwdw_compare.log)
- Header shows both impl IDs, benchdnn descriptor, and replay command (see
tests/gtests/test_convolution_backward_weights_dw_compare.cpp:186-201)
Observed behavior
- PyTorch test failure:
AssertionError: Tensor-likes are not close!
Mismatched elements: 72 / 216 (33.3%)
Greatest absolute difference: 3.0
- OneDNN chooses jit_dw:sve_256 for both FWD and BWD_W on the above config.
- gtest A/B shows legacy path accumulates extra bottom-row contributions on strided, padded cases (duplicate accumulation at tile boundaries). New per-row path matches a naïve reference.
- benchdnn did not reproduce the mismatch (even with
--fast-ref=false and buffer replay).
Workaround validated: removing the AArch64 jit BWD_W (SVE-256) path from the CPU convolution list avoids the failure (fallback path passes like it already does for Neoverse N1 & Neoverse V2).
Expected behavior
Backward-weights results should match the naïve reference (and mkldnn-disabled PyTorch path) with zero elementwise diffs for these configs.
Additional notes
Attachments
Related PR
Summary
Depthwise backward-weights on AArch64 SVE-256 produces incorrect results for strided, padded cases (e.g., C=24, Kh=3, Sh=2, Ph=1). PyTorch test
TestConvolutionNN.test_Conv2d_OneDNNfails, while benchdnn does not flag the issue. A regression (gtest) comparing a legacy blocked-oh path vs a new per-row path exposes the defect.Fix merged in PR #4081.
cc. @Sqvid
Version
80a3a8e745d2f0186e674b0af9332fd6e074c94f)Environment
nthr=32Steps to reproduce
1) PyTorch unit test (fails)
Typical verbose snippet at failure:
2) Detailed C++ gtest reproduction steps
Start from oneDNN v3.9.1 (commit
80a3a8e745d2f0186e674b0af9332fd6e074c94f) on AArch64 SVE-256 (Neoverse V1).Prerequisites
tests/gtests/test_convolution_backward_weights_dw_compare.cppandsrc/cpu/aarch64/jit_uni_dw_convolution.cppwith the supplied versions (attachments)tests/gtests/test_convolution_backward_weights_dw_compare.cpp(attachment)ONEDNN_AARCH64_DW_BWDW_USE_OLD=1→ legacy pathg24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1Build configuration
Run regression test
Optional: benchdnn verification
Logs & diff evidence
depthwise_bwdw_compare.lognext to the binary (build/tests/gtests/depthwise_bwdw_compare.log)tests/gtests/test_convolution_backward_weights_dw_compare.cpp:186-201)Observed behavior
--fast-ref=falseand buffer replay).Workaround validated: removing the AArch64 jit BWD_W (SVE-256) path from the CPU convolution list avoids the failure (fallback path passes like it already does for Neoverse N1 & Neoverse V2).
Expected behavior
Backward-weights results should match the naïve reference (and mkldnn-disabled PyTorch path) with zero elementwise diffs for these configs.
Additional notes
Attachments
src/cpu/aarch64/jit_uni_dw_convolution.cppkernel version with legacy/new path toggletests/gtests/test_convolution_backward_weights_dw_compare.cpp(gtest repro; includes env flag to toggle old/new)Related PR