GPU Indexing Schemes and Launch Configurations
Closes #119 (closed)
This MR contributes
- Lambdas (#119 (closed)), which act as wrappers around one-line backend expressions that can be evaluatated
- More flexible GPU indexing schemes via modular work item mappings
- Launch Configurations as encapsulated, user-configurable objects
Lambdas
Add codegen.functions.Lambda
, a simple wrapper around a backend expression tree that can be exported
to the user and evaluated by the runtime system
Indexing Schemes in Config
- Make the two indexing schemes
Linear3D
andBlockwise4D
available through the config (they correspond to the pystencils 1.3.xblock
andline
indexing options) - Permit users to set a default block size for
Linear3D
GPU Thread Index Mapping
- Remove
GpuThreadsRange
; the thread range is no longer being computed by the CUDA and SYCL platforms - Extend the Cuda platform to receive its mapping from thread indices to iteration space points via a callback object
- Implement thread indexing for Linear3D and Blockwise4D
Launch Configurations
- Encapsulate GPU launch configurations into objects of the base class
codegen.gpu_indexing.GpuLaunchConfiguration
, which is evaluated by the JIT in the context of the kernel launch. - Add subclasses for fully manual and fully automatic lauch configurations, as well as a launch config with dynamic block size for the Linear3D indexing scheme. Enable the user to set launch config parameters on the compiled kernel object.
- Update JIT to adhere to the new launch config interface.
Codegen Driver Updates
- Introduce the
GpuIndexing
class which acts as a factory for launch configurations and thread indexing objects depending on the user-provided configuration
Edited by Frederik Hennig