BWDW JIT 256 error reproducible through gtests, but not through benchdnn

# Summary
Depthwise **backward-weights** on AArch64 SVE-256 produces incorrect results for strided, padded cases (e.g., C=24, Kh=3, Sh=2, Ph=1). PyTorch test `TestConvolutionNN.test_Conv2d_OneDNN` fails, while benchdnn does not flag the issue. A regression (gtest) comparing a **legacy** blocked-oh path vs a new per-row path exposes the defect.  
Fix merged in **PR #4081**.

cc. @Sqvid 

# Version
- oneDNN **v3.9.1** (commit `80a3a8e745d2f0186e674b0af9332fd6e074c94f`)
- Also reproduced with oneDNN **v3.7.1**

# Environment
- CPU: **AArch64 SVE (256-bit)** (Neoverse V1)
- oneDNN runtime: **OpenMP**, `nthr=32`
- PyTorch (arm/aarch64 build) using oneDNN backend
- Python 3.10

# Steps to reproduce

## 1) PyTorch unit test (fails)
```bash
# ONEDD_VERBOSE=all to capture impl & commit
export ONEDD_VERBOSE=all
python pytorch/test/nn/test_convolution.py TestConvolutionNN.test_Conv2d_OneDNN
```
Typical verbose snippet at failure:
```
onednn_verbose,v1,info,oneDNN v3.9.1 (commit 80a3a8e...)
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,forward_training,...
onednn_verbose,v1,primitive,exec,cpu,convolution,jit_dw:sve_256,backward_weights,...
g24mb1_ic24oc24_ih6oh3kh3sh2ph1_iw6ow3kw3sw2pw1
```

## 2) Detailed C++ gtest reproduction steps
Start from oneDNN **v3.9.1** (commit `80a3a8e745d2f0186e674b0af9332fd6e074c94f`) on AArch64 SVE-256 (Neoverse V1).

### Prerequisites
- Replace `tests/gtests/test_convolution_backward_weights_dw_compare.cpp` and `src/cpu/aarch64/jit_uni_dw_convolution.cpp` with the supplied versions (attachments)
- File: **`tests/gtests/test_convolution_backward_weights_dw_compare.cpp`** (attachment)  
- Compares **legacy** vs **new** AArch64 DW BWD_W (env-switchable):
  - `ONEDNN_AARCH64_DW_BWDW_USE_OLD=1` → legacy path  
  - unset → new per-row path  
- Descriptor used: `g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1`

### Build configuration
```bash
# Configure with tests enabled
cmake -S . -B build -DDNNL_BUILD_TESTS=ON

# Rebuild so both the kernel and gtest pick up changes
cmake --build build --target all -- -j$(nproc)
```

### Run regression test
```bash
cd build && ctest -V -R test_convolution_backward_weights_dw_compare
```

### Optional: benchdnn verification
```bash
ONEDNN_VERBOSE=all ./build/tests/benchdnn/benchdnn --conv --dir=BWD_W --fast-ref=false g24mb1_ic24ih8iw8_oc24oh4ow4_kh3kw3_sh2sw2_ph1pw1
```

### Logs & diff evidence
- Each run writes `depthwise_bwdw_compare.log` next to the binary (`build/tests/gtests/depthwise_bwdw_compare.log`)
- Header shows both impl IDs, benchdnn descriptor, and replay command (see `tests/gtests/test_convolution_backward_weights_dw_compare.cpp:186-201`)

# Observed behavior
- PyTorch test failure:
  ```
  AssertionError: Tensor-likes are not close!
  Mismatched elements: 72 / 216 (33.3%)
  Greatest absolute difference: 3.0
  ```
- OneDNN chooses **jit_dw:sve_256** for both FWD and BWD_W on the above config.
- gtest A/B shows **legacy** path accumulates extra bottom-row contributions on strided, padded cases (duplicate accumulation at tile boundaries). New per-row path matches a naïve reference.
- **benchdnn did not reproduce** the mismatch (even with `--fast-ref=false` and buffer replay).

**Workaround validated:** removing the AArch64 **jit BWD_W (SVE-256)** path from the CPU convolution list avoids the failure (fallback path passes like it already does for Neoverse N1 & Neoverse V2).

# Expected behavior
Backward-weights results should match the naïve reference (and mkldnn-disabled PyTorch path) with **zero** elementwise diffs for these configs.

# Additional notes
- After applying the fix from **PR #4081**, PyTorch unit tests and nightly suite pass; the attached gtest shows new path == reference.
- Toggle: ONEDNN_AARCH64_DW_BWDW_USE_OLD=1 (legacy) vs unset (new).

# Attachments
- `src/cpu/aarch64/jit_uni_dw_convolution.cpp` kernel version with legacy/new path toggle 

- - [jit_uni_dw_convolution.cpp](https://github.com/user-attachments/files/22884641/jit_uni_dw_convolution.cpp)

- `tests/gtests/test_convolution_backward_weights_dw_compare.cpp` (gtest repro; includes env flag to toggle old/new)

- - [test_convolution_backward_weights_dw_compare.cpp](https://github.com/user-attachments/files/22884642/test_convolution_backward_weights_dw_compare.cpp)

# Related PR
- https://github.com/uxlfoundation/oneDNN/pull/4081

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BWDW JIT 256 error reproducible through gtests, but not through benchdnn #4124

Summary

Version

Environment

Steps to reproduce

1) PyTorch unit test (fails)

2) Detailed C++ gtest reproduction steps

Prerequisites

Build configuration

Run regression test

Optional: benchdnn verification

Logs & diff evidence

Observed behavior

Expected behavior

Additional notes

Attachments

Related PR

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BWDW JIT 256 error reproducible through gtests, but not through benchdnn #4124

Description

Summary

Version

Environment

Steps to reproduce

1) PyTorch unit test (fails)

2) Detailed C++ gtest reproduction steps

Prerequisites

Build configuration

Run regression test

Optional: benchdnn verification

Logs & diff evidence

Observed behavior

Expected behavior

Additional notes

Attachments

Related PR

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions