[auto-tune] Does the auto-tune speed become slower with the auto-tuning proceding?

Edwardmark · November 7, 2019, 3:49am

Hi, I have two questions

I noticed that the time cost for each task is much longer with the auto-tuning proceding, for example, the first task maybe cost about 2000s, but the 26 task cost more than 10000s. My network has 106 task, so I have to wait maybe more than 10 days to auto-tune it? The log is as follows.

image.png817×56 1.36 KB

image.png750×45 1.3 KB
I tune the model in a 4 gpu server, I found that cpu occupation is not really high as below, is that normal? Why not all cpu is occupied?

comaniac · November 7, 2019, 6:35am

For 1, it might be due to the latency difference of each task. Maybe the workload in the 26-th task has longer latency on average so it takes longer for every trial. And yes, you may need a very long time to tune a deep network.

Edwardmark · November 7, 2019, 6:39am

@comaniac Thank you very much. Another question when tuning, I saw many log says:
Timeout in RPC session, kill…

What does that mean and how I can speed up the tuning process?
Thank you very much.

Best, Edward