I’ve modified tune_relay_cuda tutorial example for intel_graphics (OpenCL) target. You can find it on my GitHub.
I thought that it should be simple and straight forward. Unfortunately, it’s not. I am successful in running it, but results are useless. First of all when running tuning the progress stops on 1. No matter how many trials I choose. Output looks something like that:
[Task 1/12] Current/Best: 58.42/ 58.42 GFLOPS | Progress: (1/20) | 3.50 s Done.
[Task 2/12] Current/Best: 65.10/ 65.10 GFLOPS | Progress: (1/20) | 3.39 s Done.
[Task 3/12] Current/Best: 54.75/ 54.75 GFLOPS | Progress: (1/20) | 3.81 s Done.
[Task 4/12] Current/Best: 71.56/ 71.56 GFLOPS | Progress: (1/20) | 3.38 s Done.
Logs seems to be ok:
INFO:autotvm:Get devices for measurement successfully!
Additionally, performance with and without autotvm seems to be exactly the same. And finally, I don’t get info that autotvm cannot find config for the target.
Could you please correct me, how such tuning file for OpenCL should work?