Summary
There should be easier ways for JIT code generators to create blocked memory layouts without relying on the tag system.
Problem statement
The RISC-V vector extension leverages a vector-length-agnostic software programming model, enabling hardware providers freedom to chose the vector length of their processor implementations.
Current hardware have vector lengths ranging from 128- to 1024-bits, but the ISA does not impose an upper limit.
In my research group, we experiment with classic vector processors design, with implementations where vlen can reach up to 16384-bits (512 fp32 elements).
We want to support our processors by adding vector-length-agnostic (VLA) solutions that can target both short-vector and long-vector systems and still benefit from pre-packed memory layouts.
However, the memory descriptor system relies heavily on layout tags as of now, and a large number of new memory tags would be necessary to support RISC-V systems if we account for VLA and the vector-register grouping feature.
Preferred solution
I propose to turn the fill_blocked method on memory_desc_wrapper.cpp into a static method of the memory_desc_wrapper class.
This way, JIT code generators could access this method to create blocked memory layouts without interacting with tags.
For instance, the code below could be used to create convolution weights with blocked output channel dimensions for any vector length (from 128 to 16384 bits) after the JIT generator has decided the block sizes as a function of the vector length.
memory_desc_wrapper::fill_blocked(weights_md, {0, 1, 2, 3}, {jcp.oc_block}, {0});
edit: the fill_blocked function is only used by the memory_desc_wrapper static function compute_blocking to fill the blocking descriptor from tags.
Summary
There should be easier ways for JIT code generators to create blocked memory layouts without relying on the tag system.
Problem statement
The RISC-V vector extension leverages a vector-length-agnostic software programming model, enabling hardware providers freedom to chose the vector length of their processor implementations.
Current hardware have vector lengths ranging from 128- to 1024-bits, but the ISA does not impose an upper limit.
In my research group, we experiment with classic vector processors design, with implementations where vlen can reach up to 16384-bits (512 fp32 elements).
We want to support our processors by adding vector-length-agnostic (VLA) solutions that can target both short-vector and long-vector systems and still benefit from pre-packed memory layouts.
However, the memory descriptor system relies heavily on layout tags as of now, and a large number of new memory tags would be necessary to support RISC-V systems if we account for VLA and the vector-register grouping feature.
Preferred solution
I propose to turn the fill_blocked method on memory_desc_wrapper.cpp into a static method of the memory_desc_wrapper class.
This way, JIT code generators could access this method to create blocked memory layouts without interacting with tags.
For instance, the code below could be used to create convolution weights with blocked output channel dimensions for any vector length (from 128 to 16384 bits) after the JIT generator has decided the block sizes as a function of the vector length.
edit: the fill_blocked function is only used by the memory_desc_wrapper static function compute_blocking to fill the blocking descriptor from tags.