[auto tuning] Auto-tuning is really slow

puddingfjz · March 31, 2020, 5:33pm

I see. May I ask how long it would take if I use the RPC tracker as mentioned in the tutorial to do the tuning?

Thanks a lot.

kevinthesun · March 31, 2020, 7:34pm

It depends on how you do job parallelization. AFAIK, it is not easy to utilize multiple GPU cards inside a single host machine with current upstream tvm. It takes several days to complete tuning for resnet50 on single gpu. We do have a plan to develop a new autotvm infra to accelerate this process.

jmorrill · April 1, 2020, 12:09am

On a 10 core intel and 1080gtx or tesla t4, I see around 30 - 60 minutes per op that is tuned with CUDA. This is with an early stop of 600 trials.

A higher core CPU will be MUCH faster if using XGBoost tuner, which in my experience takes up a bulk of the time.

puddingfjz · April 1, 2020, 6:24am

Thank you for your reply. I use the docker image nvidia/cuda:10.2-cudnn7-devel-ubuntu18.04 and build tvm from source. I try to run the Tuning High Performance Convolution on NVIDIA GPUs with GridSearchTuner, and I set TVM_NUM_THREADS = 12 and TVM_BIND_THREADS = 0. I did not use the RPC tracker. I find that the tuning only use 1 CPU core (use 85.20% CPU). Could you give me some advice on this situation? How can I use all CPU cores?

jmorrill · April 1, 2020, 7:02am

IIRC, GridSearchTuner isn’t multicore and not a good choice for tuning GPUs due to the large search space. You will want to use XGBoost tuner. It will use all your cores by default, you will see this every 64 trials (by default). It will melt all your cores for up to a few minutes, then go back to running the trials.

Also, IIRC, TVM_NUM_THREADS and TVM_BIND_THREADS are for inferencing, so it comes into play when its a CPU target and you are running your network.

puddingfjz · April 1, 2020, 8:34am

Thank you very much.