ARM SME detection for macOS
This is a follow-up to !385 (merged), which only implemented support for Streaming SVE / SME on Linux. In the meantime, Macs with suitable CPUs have been released, so let's add the detection code and compiler flags for it. This revealed two bugs (SME shouldn't be the highest, i.e. default, vector ISA, as this will break instruction_set="best"
and test_aligned_and_nt_stores
which is intentionally only run on one ISA; streaming attribute must not be added to Neon kernels).