Some questions about the auto-tuning and its performance

dolphintear · June 19, 2019, 2:27pm

If I did the auto-tunning on intel cpu i3 and got its best result and its .so , jason, params files. and after that, I just use these three files: .so , .jason, .params result from i3’s result to do the inference on cpu i5 or cpu i7, what will the performance be like ? it will work faster than on i3 or there is no big difference between i3 or i5

Thank you very much!

kevinthesun · June 20, 2019, 12:19am

This depends on the actual CPU type. For example, i3-8121U has AVX512. This means if you have this CPU and turns on avx512 when tuning, the output log won’t work for i5/i7 which doesn’t support this feature.(Core i5-7640X and Core i7-7740X) If all of your devices support AVX512(or AVX2), you can turn on AVX while tuning and directly reuse the output log file.

dolphintear · June 20, 2019, 1:02am

thanks, @kevinthesun
I used baidu to search for the instruction types which the intel cpus can support and found that both i5-7640 and i7 7740X use AVX2，AVX-512 , this means that I can reuse the output file from i3?

another question: there could be this condition, the network is fixed, but the parameters can change frequently. for example, the dataset increases and reruns the training and get a better result with new parameters. on this condition, do we need to do the auto-tunning again to get the best result?

Thanks a lot!

kevinthesun · June 20, 2019, 1:18am

As long as the conv2d workloads stay the same, we don’t need to autotune. For example, resnet50_v1 and resnet152_v1 share the same workloads. Weight doesn’t affect autotune.

dolphintear · June 20, 2019, 1:25am

thanks, @kevinthesun again!
tasks = autotvm.task.extract_from_program(net, target=target, params=params, ops=(relay.op.nn.conv2d,)), there is little confuse. the tasks fed for auto-tuning are extracted with the params, why does the extract stage need the params

Thank you very much!

dolphintear · June 20, 2019, 4:00am

Hi, @kevinthesun

another question to bother you. If I use two cores for auto-tuning, for example, num_threads = 2
os.environ[“TVM_NUM_THREADS”] = str(num_threads) , when doing the deployment in C++, how to set this device id: ```
int device_id = ?, where the device_id used in : TVMArrayAlloc(in_shape, in_ndim, dtype_code, dtype_bits, dtype_lanes, device_type, device_id, &x);

Thanks a lot!