[AutoTVM]Tuning fails for an NCHW network on x86 CPU KeyError: 'topi_op'

xxxx
2020-02-06 21:58:51,702 INFO Start to benchmark layout transformation...
Traceback (most recent call last):

  File "inceptionv3.py", line 258, in test
    tune_graph(mod["main"], data_shape, log_file, graph_opt_sch_file)

  File "inceptionv3.py", line 221, in tune_graph
    executor.benchmark_layout_transform(min_exec_num=2000)

  File "/tvm/python/tvm/autotvm/graph_tuner/base_graph_tuner.py", line 432, in benchmark_layout_transform
    self._iterate_layout_transform(_fetch_args_callback)

  File "/tvm/python/tvm/autotvm/graph_tuner/base_graph_tuner.py", line 269, in _iterate_layout_transform
    i_topi_op = in_node_entry["topi_op"][0]

KeyError: 'topi_op'

I use the official x86 Tutorial (https://docs.tvm.ai/tutorials/autotvm/tune_relay_x86.html#sphx-glr-tutorials-autotvm-tune-relay-x86-py), It’s OK.

But I replaced it with the inceptionv3 model, which reported an error (KeyError: ‘topi_op’)

inceptionv3 model:https://github.com/dmlc/web-data/tree/master/tensorflow/models/InceptionV3

The Inceptionv3 model is NHWC, I use ConvertLayout to NCHW.

I want to know what is the reason for this error?

Can you print in_node_entry to see what it looks like?

Thanks for your reply.I’ve been running it all day and I haven’t gotten any results yet…very slow

I’ve got inception_v3.log, but how do I use it directly. get inception_v3_graph_opt.log.

inception_v3.log

{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 1280], "float32"], ["TENSOR", [1, 1, 1280, 448], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 1280, "float32"], [1, 1, 1280, 448, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.5200421810150146, 1580808485.334225], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 448], "float32"], ["TENSOR", [3, 3, 448, 384], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 448, "float32"], [3, 3, 448, 384, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 1.1804931163787842, 1580808487.266453], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 1280], "float32"], ["TENSOR", [1, 1, 1280, 192], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 1280, "float32"], [1, 1, 1280, 192, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.3534369468688965, 1580808488.421916], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 2048], "float32"], ["TENSOR", [1, 1, 2048, 320], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 2048, "float32"], [1, 1, 2048, 320, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.36614990234375, 1580808489.606716], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 2048], "float32"], ["TENSOR", [1, 1, 2048, 384], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 2048, "float32"], [1, 1, 2048, 384, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.451977014541626, 1580808490.87119], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 2048], "float32"], ["TENSOR", [1, 1, 2048, 448], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 2048, "float32"], [1, 1, 2048, 448, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.4711000919342041, 1580808492.074238], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 8, 8, 2048], "float32"], ["TENSOR", [1, 1, 2048, 192], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 8, 8, 2048, "float32"], [1, 1, 2048, 192, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.36439990997314453, 1580808493.2370281], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_nn_conv2d", [["TENSOR", [1, 1, 1, 2048], "float32"], ["TENSOR", [1, 1, 2048, 1001], "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 1, 1, 2048, "float32"], [1, 1, 2048, 1001, "float32"], [1, 1], [0, 0, 0, 0], [1, 1], "NHWC", "float32"], {"i": 0, "t": "direct", "c": null, "e": []}], "r": [[1000000000.0], 4, 0.846611738204956, 1580808494.912351], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 7, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 4, 0.36194419860839844, 1580808775.390246], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 5, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 4, 0.2373192310333252, 1580808775.6204748], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 6, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 4, 0.2645392417907715, 1580808775.877562], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 1, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 4, 0.2521169185638428, 1580808776.1024058], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 4, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 4, 0.22719740867614746, 1580808776.3249972], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 3, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 4, 0.26436567306518555, 1580808776.5760279], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 299, 299, 3], "float32"], ["TENSOR", [3, 3, 3, 32], "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 299, 299, 3, "float32"], [3, 3, 3, 32, "float32"], [2, 2], [0, 0], [1, 1], "NHWC", "float32"], {"i": 2, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 4, 0.25417470932006836, 1580808776.8196049], "v": 0.1}
.......

My code(from The Office Tutorials — tune_relay_x86.py):

model_name = "inception_v3"
log_file = "%s.log" % model_name
graph_opt_sch_file = "%s_graph_opt.log" % model_name

def tune_kernels(tasks,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    for i, tsk in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        ........
        
        # do tuning
        n_trial=len(task.config_space)
        print("n_trial:",n_trial)
        tuner_obj.tune(n_trial=n_trial,
                       early_stopping=early_stopping,
                       measure_option=measure_option,
                       callbacks=[
                           autotvm.callback.progress_bar(n_trial, prefix=prefix),
                           autotvm.callback.log_to_file(log_filename)])

def tune_graph(graph, dshape, records, opt_sch_file, use_DP=True):
    target_op = [relay.nn.conv2d]
    Tuner = DPTuner if use_DP else PBQPTuner
    executor = Tuner(graph, {input_name: dshape}, records, target_op, target)
    executor.benchmark_layout_transform(min_exec_num=2000)
    executor.run()
    executor.write_opt_sch2record_file(opt_sch_file)

def tune_and_evaluate(tuning_opt):
    # extract workloads from relay program
    print("Extract tasks...")
    mod, params, data_shape, out_shape = get_network(xxx)
    tasks = autotvm.task.extract_from_program(mod["main"], target=target,
                                              params=params, ops=(relay.op.nn.conv2d,))

    # run tuning tasks
    print("Tuning...")
    tune_kernels(tasks, **tuning_opt)
    tune_graph(mod["main"], data_shape, log_file, graph_opt_sch_file)

    # compile kernels with graph-level best records
    with autotvm.apply_graph_best(graph_opt_sch_file):
        ......

I’ll try to annotate tune_kernels function and then run tune_graph.But error

.tvm/tophub/llvm_v0.03.log
-----------------
Cannot find config for target=llvm -device=tracing, workload=('conv2d', (1, 2048, 1, 1, 'float32'), (1001, 2048, 1, 1, 'float32'), (1, 1), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):

  File "inceptionv3.py", line 303, in <module>
    test()

  File "inceptionv3.py", line 257, in test
    tune_graph(mod["main"], data_shape, log_file, graph_opt_sch_file)

  File "inceptionv3.py", line 219, in tune_graph
    executor = Tuner(graph, {input_name: dshape}, records, target_op, target)

  File "/Users/heliqi/learn/tvm/tvm/python/tvm/autotvm/graph_tuner/dynamic_programming_tuner.py", line 43, in __init__
    super(DPTuner, self).__init__(*args, **kwargs)

  File "/Users/heliqi/learn/tvm/tvm/python/tvm/autotvm/graph_tuner/base_graph_tuner.py", line 157, in __init__
    self._fetch_cfg()

  File "/Users/heliqi/learn/tvm/tvm/python/tvm/autotvm/graph_tuner/base_graph_tuner.py", line 217, in _fetch_cfg
    for record in cfg_dict[workload]:

KeyError: ('conv2d', (1, 3, 299, 299, 'float32'), (32, 3, 3, 3, 'float32'), (2, 2), (0, 0, 0, 0), (1, 1), 'NCHW', 'float32')

The configs shown in the log you posted were failed. Could you check if all configs are failed or not? You will see the measure result to be like "r": [[1000000000.0], 4, ..., where 4 is error code. It’s fine if error code is 0 or 1, but it usually indicates problems with your environments when error code is 2 (compile error) or 4 (runtime error).

@kevinthesun I think this is the case where graph tuning fails because we need to run FoldConstant. Because of ConvertLayout, weight has a transpose before it. Current graph tuning assumes that conv weight will either be Var or Const, but in this case it is a CallNoe.

I check the inception_v3.log, show… Front part is 2 or 4 , everything in the back is 0.

{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 265, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 2]], ["tile_oc", "sp", [-1, 4]], ["tile_ow", "sp", [-1, 7]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 2, 0.021444082260131836, 1580809441.6598809], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 33, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 8]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 2, 0.005236148834228516, 1580809441.660099], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 283, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 2]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 7]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 2, 0.022201061248779297, 1580809441.6650748], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 27, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 8]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 2, 0.007702827453613281, 1580809441.665184], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 238, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 16]], ["tile_oc", "sp", [-1, 8]], ["tile_ow", "sp", [-1, 3]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 2, 0.007006168365478516, 1580809441.6652648], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 138, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 21]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 2, 0.02295207977294922, 1580809441.673057], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 202, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 16]], ["tile_oc", "sp", [-1, 8]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[1000000000.0], 2, 0.008283138275146484, 1580809441.673165], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 147, 147, 32], "float32"], ["TENSOR", [3, 3, 32, 64], "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {}, ["conv2d", [1, 147, 147, 32, "float32"], [3, 3, 32, 64, "float32"], [1, 1], [1, 1, 1, 1], [1, 1], "NHWC", "float32"], {"i": 52, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 16]], ["tile_oc", "sp", [-1, 4]], ["tile_ow", "sp", [-1, 3]], ["unroll_kw", "ot", true]]}], "r": [[1000000000.0], 2, 0.007550954818725586, 1580809441.673218], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 20, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0002464204190845934], 0, 4.562601804733276, 1580900077.765447], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 11, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.00017641161392919328], 0, 2.7835729122161865, 1580900080.42416], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 8, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.0002172476943910872], 0, 1.373981237411499, 1580900081.74618], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 13, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.002153247206695779], 0, 2.63421368598938, 1580900084.3336918], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 18, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 8]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0002868389631361761], 0, 1.2750489711761475, 1580900085.5677419], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 0, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.0005383609621149042], 0, 2.302680015563965, 1580900087.798537], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 5, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 4]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.0005338464408759124], 0, 2.663329839706421, 1580900090.42861], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 9, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 16]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.0002036475430579965], 0, 1.2794818878173828, 1580900091.66368], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 7, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 8]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.0003061685941409826], 0, 1.2445158958435059, 1580900093.151464], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 10, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", true]]}], "r": [[0.00016905061628664497], 0, 1.368588924407959, 1580900094.379401], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 22, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 32]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0001787599203123206], 0, 2.8403818607330322, 1580900097.080885], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 12, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0005430750533188248], 0, 2.7509076595306396, 1580900099.753608], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 15, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 2]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0010717476211901306], 0, 2.686265707015991, 1580900102.382468], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 14, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 2]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0010392188580040187], 0, 2.7776200771331787, 1580900105.1073961], "v": 0.1}
{"i": ["llvm -mcpu=core-avx2", "topi_x86_conv2d_NCHWc", [["TENSOR", [1, 3, 299, 299], "float32"], ["TENSOR", [32, 3, 3, 3], "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {}, ["conv2d", [1, 3, 299, 299, "float32"], [32, 3, 3, 3, "float32"], [2, 2], [0, 0], [1, 1], "NCHW", "float32"], {"i": 16, "t": "direct", "c": null, "e": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 4]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}], "r": [[0.0005323639027921406], 0, 1.2741262912750244, 1580900106.331873], "v": 0.1}

What’s wrong with my environment?

The inception_v3 tensorflow model is NHWC ,I use ConvertLayout transform from NHWC to NCHW. Is this the cause of the mistake? In the log, Tensor Is equal to the ‘NHWC’ , it’s 2 or 4 ;Tensor Is equal to the ‘NCHW’ ,it’s 0

I think ConvertLayout still has problems,Although the transfer error was resolved([transform.ConvertLayout] transform inception_v1 model from from_tensorflow.py failed)

I find some ‘NHWC’ in the inception_v3.log, Not completely converted or the other problem?

@janimesh I try it a few times and sometimes got the following error. It seems to be an error reading after model ConverLayout.

The fn(…) is ‘NHWC’ , but The %0 ~ end Layout is ‘NCHW’. Is this a mistake ?

......
File "/tvm/python/tvm/relay/module.py", line 233, in from_expr
    return _module.Module_FromExpr(expr, funcs, defs)

  File "/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 213, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) 9   libtvm.dylib                        0x000000011c29a868 TVMFuncCall + 72
  [bt] (7) 8   libtvm.dylib                        0x000000011bb04b46 std::__1::__function::__func<void tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::RelayExpr, tvm::Map<tvm::GlobalVar, tvm::BaseFunc, void, void>, tvm::Map<tvm::GlobalTypeVar, tvm::TypeData, void, void>)>::AssignTypedLambda<tvm::$_8>(tvm::$_8)::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*), std::__1::allocator<void tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::RelayExpr, tvm::Map<tvm::GlobalVar, tvm::BaseFunc, void, void>, tvm::Map<tvm::GlobalTypeVar, tvm::TypeData, void, void>)>::AssignTypedLambda<tvm::$_8>(tvm::$_8)::'lambda'(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)>, void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&) + 134
  [bt] (6) 7   libtvm.dylib                        0x000000011baf8b81 tvm::IRModule::FromExpr(tvm::RelayExpr const&, tvm::Map<tvm::GlobalVar, tvm::BaseFunc, void, void> const&, tvm::Map<tvm::GlobalTypeVar, tvm::TypeData, void, void> const&) + 801
  [bt] (5) 6   libtvm.dylib                        0x000000011baf6250 tvm::IRModuleNode::Add(tvm::GlobalVar const&, tvm::BaseFunc const&, bool) + 320
  [bt] (4) 5   libtvm.dylib                        0x000000011baf56e7 tvm::RunTypeCheck(tvm::IRModule const&, tvm::GlobalVar const&, tvm::relay::Function) + 1431
  [bt] (3) 4   libtvm.dylib                        0x000000011c118695 tvm::relay::InferType(tvm::relay::Function const&, tvm::IRModule const&, tvm::GlobalVar const&) + 565
  [bt] (2) 3   libtvm.dylib                        0x000000011c117838 tvm::relay::TypeInferencer::Infer(tvm::RelayExpr) + 136
  [bt] (1) 2   libtvm.dylib                        0x000000011baec5db tvm::ErrorReporter::RenderErrors(tvm::IRModule const&, bool) + 5499
  [bt] (0) 1   libtvm.dylib                        0x000000011b99dc89 dmlc::LogMessageFatal::~LogMessageFatal() + 57
In `main`: 
v0.0.4
fn (%input: Tensor[(1, 299, 299, 3), float32],%InceptionV3/Conv2d_1a....
%0 = layout_transform(%input, src_layout="NHWC", dst_layout="NCHW"); 
%1 = layout_transform(%InceptionV3/Conv2d_1a_3x3/weights, src_layout="HWIO", dst_layout="OIHW");
......

I will try again tomorrow.

need not,Tanks!

I change layout from ‘None’ to ‘NCHW’ and shape_size = (1, 299, 299, 3). Then It’s ok.

mod, params = relay.frontend.from_tensorflow(graph_def,
                                             layout=layout,
                                             shape={input_name: shape_size})
seq = relay.transform.Sequential([relay.transform.RemoveUnusedFunctions(),
                                  relay.transform.ConvertLayout('NCHW')])
with relay.transform.PassContext(opt_level=3):
    mod = seq(mod)

print("Tuning...")
tune_kernels(tasks, **tuning_option)
tune_graph(mod["main"], input_shape, log_file, graph_opt_sch_file)
......

Ok, yes, both are necessary

  • Do you still see NHWC convs in the graph?
  • Do you still see topi_op error?

I don’t print graph, need to confirm.

topi_op is all 0 and no error in log.

1 Like