[SOLVED] Autotuning with different Tuners - performance related question


I was tuning conv operation for Resnet-50 from onnx model zoo using OpenCL and copied Cuda scheduler with example tune_relay_cuda_example.py for 2000 iterations using 4 different Tuners on the same machine (nothing else was running at that time).

I noticed that best performance numbers differs on different Tuners but progress in most tasks ends before this number. For RandomTuner numbers are the lowest and for XGBTuner are usually (but not for all tasks) the highest. For GaTuner and GridSearchTuner they are always higher than for RandomTuner but in most cases lower for XGBTuner.

I thought that a tuner is used to choose a batch of configurations from a config space based on a cost function or using genetic algorithm, so if the number of iterations is considerably high, then, all of configurations will run, so best performance will be the same no matter what Tuner will you choose.
Could you explain me, then, how do tuners work? Is this difference in performance done because of higher number of timeouts and errors in particular runs or does choosing a tuner affect the measurements of the run?


2000 iterations is not considerably high. The space size for a conv2d layer on a gpu is around 10^8 - 10^9. So some tuners are more efficient.


My problem was related with different numbers of progress values when a specific task stop.
The answer is easy. I didn’t notice early_stopping parameter. Changing it to None or increasing it to be bigger did the trick.