Optimization for GPU block size determination
This MR optimizes GPU block sizes such that these are always multiples of the hardware's warp (CUDA) or wavefront (HIP) size.
Summarized, this MR
- removes BasicOption
GpuOptions.omit_range_check
- removes BasicOption
GpuOptions.block_size
- introduces BasicOption
GpuOptions.warp_size
and implements function for determining default values - introduces BasicOption
assume_warp_aligned_block_size
, ensuring the compiler that block sizes match with warp size - adds new GpuOptions to the data flow of GpuIndexing
- adds algorithm for fitting block size according to iteration space and warp size
- adds
fit_block_size
andtrim_block_size
member functions toDynamicBlockSizeLaunchConfiguration
for computing block sizes based on a user-defined initial block size and the iteration space - for assumed alignment: rounds to multiples of warp size when iteration space is unknown to generation time
Edited by Richard Angersbach