[AutoTVM] [MacOS] Failed to tune kernels with Metal/OpenCL/OpenGL backends

Hello,

I’m trying to run kernels auto-tuning with backend supporting Intel integrated graphics hardware on Macbook Pro running Mac OS Catalina. I followed this tutorial: https://docs.tvm.ai/tutorials/autotvm/tune_relay_x86.html

I succeeded to run auto-tuning with llvm backend, but failed to do so with metal, opencl and opengl backends. All fail with different errors.

Metal / OpenCL backends

I was able to run compiled model. However, auto-tuning crashes with lots of following errors:

objc[84514]: +[NSNumber initialize] may have been in progress in another thread when fork() was called.

The error has been discussed in this Github issue, but no solution has been proposed there. I’m wondering is there any hack or workaround to fix this error (even if it requires some manual single-treaded performance tuning)? Looks like it is a total blocker for running TVM kernels tuner with metal backend on MacOS.

OpenGL backend

I even failed to run compiled model. The huge error backtrace ends with:

File "/Users/g-korepanov/Desktop/yandex/zoom/intel-gpu/tvm/venv/lib/python3.7/site-packages/topi-0.7.dev1-py3.7.egg/topi/nn/conv2d.py", line 73, in conv2d raise ValueError("not support this layout {} yet".format(layout)) ValueError: not support this layout float32 yet

I suppose this is some stupid error or bug in my setup, since float32 should not be treated as layout.

Summary

To sum up, I failed to utilize any backend for auto-tuning kernels on Intel GPU hardware. And inference speed with default kernel params is critically low (compared to same model running using Apple CoreML framework).

I would highly appreciate any help in resolving any of the problems above.

Setup details

I’ve built TVM from source with OpenGL, OpenCL, Metal, LLVM and MKL-DNN extensions enabled. The model I’m trying to tuning is resnet-18 from relay.testing module. LLVM tuning is working seamlessly (however the final speed is still far from other frameworks).

P.S. Also I do not understand how to choose the specific hardware device (i.e. context) when auto-tuning kernels. In TVM tutorials there is only target being set and I’m wondering which HW device (Intel GPU or AMD GPU in my case) is going to be used for kernels tuning.