Performance after auto-tune not so good

I tried auto-tvm to speed up mobilenet v1 on NVIDIA Tesla V100. It seems that tvm has a pre-defined configuration for mobilenet on v100.

when I build without auto tune, I got 700+ fps on inference. when I auto tuned for 3 days, the result is not as good, only 520 fps. How can I get my model as fast as the pre-defined configuration?

And I also noticed that, I nearly got the same speedup with n_trial = 4, n_trial = 2000 and n_trial= 4000, seems the longer-time search didn’t give rise to higher speedup? It’s confusing.

here are my codes, any advice?

tuning_opt = {
    'log_filename': log_file,
    'tuner': 'xgb',
    'n_trial': 4000,
    'early_stopping': 1200,

    'measure_option': autotvm.measure_option(
        builder=autotvm.LocalBuilder(timeout=10),
        runner=autotvm.RPCRunner(
            'V100',
            '0.0.0.0', 9190,
            number=20, repeat=3, timeout=4, min_repeat_ms=150)
    ),
}
tasks = autotvm.task.extract_from_program(sym['main'], target=target,
                                          params=params, ops=(relay.op.nn.conv2d,))
print("Tuning...")
tune_tasks(tasks, **tuning_opt)

print("Applying Schedule Params From Log File", log_file)
# compile kernels with history best records
with autotvm.apply_history_best(log_file):
    # if True:
    print("Compile...")
    with relay.build_config(opt_level=3):
        graph, lib, params = relay.build(sym, target=target, target_host=target_host, params=params)
    save_tvm_module(graph, lib, params)