I tried auto-tvm to speed up mobilenet v1 on NVIDIA Tesla V100. It seems that tvm has a pre-defined configuration for mobilenet on v100.
when I build without auto tune, I got 700+ fps on inference. when I auto tuned for 3 days, the result is not as good, only 520 fps. How can I get my model as fast as the pre-defined configuration?
And I also noticed that, I nearly got the same speedup with n_trial = 4, n_trial = 2000 and n_trial= 4000, seems the longer-time search didn’t give rise to higher speedup? It’s confusing.
here are my codes, any advice?
tuning_opt = {
'log_filename': log_file,
'tuner': 'xgb',
'n_trial': 4000,
'early_stopping': 1200,
'measure_option': autotvm.measure_option(
builder=autotvm.LocalBuilder(timeout=10),
runner=autotvm.RPCRunner(
'V100',
'0.0.0.0', 9190,
number=20, repeat=3, timeout=4, min_repeat_ms=150)
),
}
tasks = autotvm.task.extract_from_program(sym['main'], target=target,
params=params, ops=(relay.op.nn.conv2d,))
print("Tuning...")
tune_tasks(tasks, **tuning_opt)
print("Applying Schedule Params From Log File", log_file)
# compile kernels with history best records
with autotvm.apply_history_best(log_file):
# if True:
print("Compile...")
with relay.build_config(opt_level=3):
graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params)
save_tvm_module(graph, lib, params)