Summary
bnorm tests via benchdnn do not gracefully skip in cases where the required scratchpad memory is too large. This is causing failures in the AArch64 Nightly pipeline in test_benchdnn_modeC_bnorm_regressions_cpu.
Version
8d4aa8b
Environment
Reproduced on x64 and AArch64.
hash: 8d4aa8b (introduced the failing case, but the issue already existed).
Steps to reproduce
$ ./build/tests/benchdnn/benchdnn --bnorm --dt=bf16 --inplace=true mb1ic512ih65536
Error: Function 'create_primitive' at (oneDNN/tests/benchdnn/dnnl_common.hpp:469) returned 'out_of_memory'
Error: Function 'init_prim' at (oneDNN/tests/benchdnn/dnnl_common.hpp:523) returned '1'
Error: Function 'createit' at (oneDNN/tests/benchdnn/bnorm/bnorm.cpp:710) returned '1'
Error: Function 'create' at (oneDNN/tests/benchdnn/utils/task.hpp:57) returned '1'
0:UNTESTED_FAILED (0 ms) __REPRO: --bnorm --dt=bf16 --inplace=true mb1ic512ih65536
===========================================================
= Failed cases summary (--summary=no-failures to disable) =
===========================================================
0:UNTESTED_FAILED (0 ms) __REPRO: --bnorm --dt=bf16 --inplace=true mb1ic512ih65536
============================
tests:1 passed:0 skipped:0 mistrusted:0 unimplemented:0 invalid_arguments:0 failed:1 listed:0
total: 0.00s; create_pd: 0.00s (3%); create_prim: 0.00s (0%); fill: 0.00s (0%); execute: 0.00s (0%); compute_ref: 0.00s (0%); compare: 0.00s (0%);
Observed behavior
Fails with status UNTESTED_FAILED
Expected behavior
Expected status SKIPPED. The gpu backend seems to have some handling for skipping large cases though I have not tested it.
|
// The library scratchpad is allocated at create_primitive stage. The memory |
|
// check is moved after the creation stage. It's necessary to check the |
|
// library scratchpad size against gpu_max_alloc, otherwise, out_of_memory |
|
// would be issued by the library. |
|
if (res->mem_size_args.scratchpad_size > 0 && is_gpu() |
|
&& query_scratchpad_mode(query_attr(pdw)) |
|
== dnnl_scratchpad_mode_library) { |
|
static size_t gpu_device_capacity = 0; |
|
static size_t gpu_max_alloc_capacity = 0; |
|
SAFE(get_gpu_ram_sizes(gpu_device_capacity, gpu_max_alloc_capacity), |
|
WARN); |
|
const bool fit |
|
= res->mem_size_args.scratchpad_size < gpu_max_alloc_capacity; |
|
if (!fit) { |
|
BENCHDNN_PRINT(1, |
|
"[CHECK_MEM]: Size of the scratchpad %s " |
|
"doesn't fit the allocation limit of %s.\n", |
|
smart_bytes(res->mem_size_args.scratchpad_size).c_str(), |
|
smart_bytes(gpu_max_alloc_capacity).c_str()); |
|
res->state = SKIPPED; |
|
res->reason = skip_reason::not_enough_ram; |
|
return OK; |
|
} |
|
} |
Summary
bnormtests viabenchdnndo not gracefully skip in cases where the required scratchpad memory is too large. This is causing failures in the AArch64 Nightly pipeline intest_benchdnn_modeC_bnorm_regressions_cpu.Version
8d4aa8b
Environment
Reproduced on x64 and AArch64.
hash: 8d4aa8b (introduced the failing case, but the issue already existed).
Steps to reproduce
Observed behavior
Fails with status
UNTESTED_FAILEDExpected behavior
Expected status
SKIPPED. Thegpubackend seems to have some handling for skipping large cases though I have not tested it.oneDNN/tests/benchdnn/dnnl_common.hpp
Lines 444 to 467 in db17ac9