Object-Oriented CPU JIT API and Prototype Implementation
This MR contributes the structure, APIs, and test suite for the object-oriented CPU JIT-compiler. The design is threefold:
- Information specific to the underlying host compiler shall be provided by a
CompilerInfo
object. - The extension module builder component shall be responsible for constructing the source code of the dynamically compiled and loaded Python extension module that contains the kernel. It will also provide the function wrapper object around the extension module.
- The
CpuJit
driver gets both the compiler info and the module builder as arguments and orchestrates the compilation by invoking the module builder, storing the resulting code in the filesystem, invoking the host compiler, and loading the compiled extension module.
Of this, the MR provides:
- The
CpuJit
driver - An abstract base class for the compiler infos, and implementations for GCC and Clang
- An abstract base class for the extension module builders
- A prototype implementation of an extension module builder using pybind11
- A test suite to check the JIT compiler's expected functionality
The CPU JIT compiler introduced here works, but is very slow (see below), and therefore considered experimental.
While we work on improving its internals to accellerate compilation, it can already be used to develop new features.
To support this, this MR adds the --experimental-cpu-jit
flag to the pytest suite to enable the new JIT compiler in the gen_config
fixture.
Also, this MR extends the documentation and contribution guide with info about the new features.
On pybind11
Pybind11 provides C++ API wrappers around various Python stdlib-APIs, as well as the NumPy C API, and therefore makes it super-easy to write extension modules in C++. However, due to its heavy use of C++ templates, compiling code using pybind11 takes much longer than compiling code that uses the Python C API directly. Currently, we see compile times of > 2 seconds even for the most trivial kernels - far too slow for JIT-compilation, in my opinion. Therefore, I see a working JIT using pybind11 as a good starting point from which we can set up useful abstractions ourselves - after all, we only need a small part of pybind11's vast feature set.