GPU Indexing Schemes and Launch Configurations (!449) · Merge requests · pycodegen / pystencils

Frederik Hennig requested to merge fhennig/lambdas into v2.0-dev Feb 10, 2025

This MR contributes

Lambdas (#119 (closed)), which act as wrappers around one-line backend expressions that can be evaluatated
More flexible GPU indexing schemes via modular work item mappings
Launch Configurations as encapsulated, user-configurable objects

Lambdas

Add codegen.functions.Lambda, a simple wrapper around a backend expression tree that can be exported to the user and evaluated by the runtime system

Indexing Schemes in Config

Make the two indexing schemes Linear3D and Blockwise4D available through the config (they correspond to the pystencils 1.3.x block and line indexing options)
Permit users to set a default block size for Linear3D

GPU Thread Index Mapping

Remove GpuThreadsRange; the thread range is no longer being computed by the CUDA and SYCL platforms
Extend the Cuda platform to receive its mapping from thread indices to iteration space points via a callback object
Implement thread indexing for Linear3D and Blockwise4D

Launch Configurations

Encapsulate GPU launch configurations into objects of the base class codegen.gpu_indexing.GpuLaunchConfiguration, which is evaluated by the JIT in the context of the kernel launch.
Add subclasses for fully manual and fully automatic lauch configurations, as well as a launch config with dynamic block size for the Linear3D indexing scheme. Enable the user to set launch config parameters on the compiled kernel object.
Update JIT to adhere to the new launch config interface.

Codegen Driver Updates

Introduce the GpuIndexing class which acts as a factory for launch configurations and thread indexing objects depending on the user-provided configuration

Edited Feb 17, 2025 by Frederik Hennig

GPU Indexing Schemes and Launch Configurations

Lambdas

Indexing Schemes in Config

GPU Thread Index Mapping

Launch Configurations

Codegen Driver Updates

Merge request reports