Optimization for GPU block size determination (!454) · Merge requests · pycodegen / pystencils

This MR optimizes GPU block sizes such that these are always multiples of the hardware's warp (CUDA) or wavefront (HIP) size.

Summarized, this MR

removes BasicOption GpuOptions.omit_range_check
removes BasicOption GpuOptions.block_size
introduces BasicOption GpuOptions.warp_size and implements function for determining default values
introduces BasicOption assume_warp_aligned_block_size, ensuring the compiler that block sizes match with warp size
adds new GpuOptions to the data flow of GpuIndexing
adds algorithm for fitting block size according to iteration space and warp size
adds fit_block_size and trim_block_size member functions to DynamicBlockSizeLaunchConfiguration for computing block sizes based on a user-defined initial block size and the iteration space
for assumed alignment: rounds to multiples of warp size when iteration space is unknown to generation time

Edited Mar 14, 2025 by Richard Angersbach

Optimization for GPU block size determination