Skip to content

GPU Indexing Schemes and Launch Configurations

Frederik Hennig requested to merge fhennig/lambdas into v2.0-dev

Closes #119 (closed)

This MR contributes

  • Lambdas (#119 (closed)), which act as wrappers around one-line backend expressions that can be evaluatated
  • More flexible GPU indexing schemes via modular work item mappings
  • Launch Configurations as encapsulated, user-configurable objects

Lambdas

Add codegen.functions.Lambda, a simple wrapper around a backend expression tree that can be exported to the user and evaluated by the runtime system

Indexing Schemes in Config

  • Make the two indexing schemes Linear3D and Blockwise4D available through the config (they correspond to the pystencils 1.3.x block and line indexing options)
  • Permit users to set a default block size for Linear3D

GPU Thread Index Mapping

  • Remove GpuThreadsRange; the thread range is no longer being computed by the CUDA and SYCL platforms
  • Extend the Cuda platform to receive its mapping from thread indices to iteration space points via a callback object
  • Implement thread indexing for Linear3D and Blockwise4D

Launch Configurations

  • Encapsulate GPU launch configurations into objects of the base class codegen.gpu_indexing.GpuLaunchConfiguration, which is evaluated by the JIT in the context of the kernel launch.
  • Add subclasses for fully manual and fully automatic lauch configurations, as well as a launch config with dynamic block size for the Linear3D indexing scheme. Enable the user to set launch config parameters on the compiled kernel object.
  • Update JIT to adhere to the new launch config interface.

Codegen Driver Updates

  • Introduce the GpuIndexing class which acts as a factory for launch configurations and thread indexing objects depending on the user-provided configuration
Edited by Frederik Hennig

Merge request reports