Skip to content

Race condition in CPU JIT configuration

We use pystencils in concurrent build steps (ninja -j N) to generate different kernels in parallel. Every once in a while our CI fails with the error

Traceback (most recent call last):
  File "/builds/hyteg/hog/generate_all_operators.py", line 33, in <module>
    from hog.cse import CseImplementation
  File "/builds/hyteg/hog/hog/cse.py", line 21, in <module>
    import pystencils as ps
  File "/builds/hyteg/hog/env/lib/python3.10/site-packages/pystencils/__init__.py", line 12, in <module>
    from .kernelcreation import create_kernel, create_staggered_kernel
  File "/builds/hyteg/hog/env/lib/python3.10/site-packages/pystencils/kernelcreation.py", line 10, in <module>
    from pystencils.cpu.vectorization import vectorize
  File "/builds/hyteg/hog/env/lib/python3.10/site-packages/pystencils/cpu/__init__.py", line 1, in <module>
    from pystencils.cpu.cpujit import make_python_function
  File "/builds/hyteg/hog/env/lib/python3.10/site-packages/pystencils/cpu/cpujit.py", line 236, in <module>
    _config = read_config()
  File "/builds/hyteg/hog/env/lib/python3.10/site-packages/pystencils/cpu/cpujit.py", line 197, in read_config
    loaded_config = json.load(json_config_file)
  File "/usr/lib/python3.10/json/__init__.py", line 293, in load
    return loads(fp.read(),
  File "/usr/lib/python3.10/json/__init__.py", line 346, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.10/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.10/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None

My first guess is that this is a race condition between concurrent pystencils invocation. It tries to find a config file and if it does not exist, then creates it. There is a chance that the next build job (which we run in parallel) finds a partly written config file.

IMHO pystencils should not write config files to disk automatically. Especially, considering that we do not even use the JIT. In any case, it should be robust to concurrent invocations.

Edited by Daniel Bauer