Tvm auto tune is so slow, how to use multiple gpu locally to tune faster?

tvm auto tune is so slow, how to use multiple gpu locally to tune faster?
I observe that when tuning, the gpu utility is sometimes 100% in only one gpu, and sometimes 0, and the cpu utility is low and high every now and then, could you tell me how to using multiple gpu and accelerate the auto-tuning process?

Thanks,
Best,
Bin.

Did you solve the problem?

You can have a look at the tutorial: https://tvm.apache.org/docs/tutorials/autotvm/tune_relay_cuda.html

The section #scale-up-measurement-by-using-multiple-devices should solve your problem.