Different output values when setting opt_level=3 in nnvm.compiler.build_config


#1

Hi, I was doing the CNN auto-tuning experiments with TVM recently. After the tuning process, I tried to compile the model like below:

with autotvm.apply_history_best(tuning_opt['log_filename']):
    with nnvm.compiler.build_config(opt_level=3):
        graph, lib, params = nnvm.compiler.build(
            net,
            target=target,
            shape=input_shapes,
            params=params,
            dtype=dtype
        )

It turns out that setting opt_level to 3 will make the compiled model producing different output values (compared with the original mxnet model). In fact, the accuracy dropped from 93+% to 62+% in this case.
However, I can get the exact same outputs if opt_level was set to 1 or 2.

BTW, I was doing the network auto-tuning in a computer server with 8 titan-v gpus.


#2

You might have found a bug in one of the optimization passes that have opt_level=3. Can you see if changing the opt level of one of the current opt_level=3 passes fixes your problem? By the way, depending on your network, changing from opt_level=2 to opt_level=3 may not improve performance anyway. I believe OpFusion may be the most importance pass in your case.


#3
  1. Changing the OPT_PASS_LEVEL like below and set opt_level=2 still produce different network output
OPT_PASS_LEVEL = {
    "SimplifyInference": 0,
    "PrecomputePrune": 2,
    "OpFusion": 1,
    "FoldScaleAxis": 2,
    "AlterOpLayout": 3,
}
  1. Changing the OPT_PASS_LEVEL like below and set opt_level=2 produce the exact same result with the original mxnet model
OPT_PASS_LEVEL = {
    "SimplifyInference": 0,
    "PrecomputePrune": 2,
    "OpFusion": 1,
    "FoldScaleAxis": 3,
    "AlterOpLayout": 2,
}

Thus, I guess the bug comes from “FoldScaleAxis”.


#4

It could be possible, the NNVMv1’s FoldScaleAxis will not only fold positive but also negative weight, and that part of logic could become problematic when folding negative weights. This problem is already fixed in relay.


#5
relay_sym, relay_params = relay.frontend.from_mxnet(
    mx_sym,
    shape={'data': (1, 3, 224, 224)},
    dtype='float32',
    arg_params=arg_params,
    aux_params=aux_params,
)
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(
        relay_sym,
        target,
        params=relay_params
    )

The mxnet model can be compiled with the above code now. But I feel a little confused about the returned values of relay.build:

  1. The graph returned from nnvm.compiler.build is a nnvm.graph.Graph object while relay.build returned a json string. Why there is such a different design?
  2. The params returned from nnvm.compiler.build can be saved to file like below
    with open('model.params', 'wb') as fd:
        fd.write(nnvm.compiler.save_param_dict(params))
    
    Should I use the same code to serialize the params returned from relay.build?