[AutoTVM][NNVM] error when extract_task_from_graph with opt_level=3

Some error occurs when I run tune_nnvm_cuda.py, with the whole script under opt level = 3

Extract tasks...
Traceback (most recent call last):
  File "tune_nnvm_cuda.py", line 248, in <module>
    tune_and_evaluate(tuning_option)
  File "tune_nnvm_cuda.py", line 212, in tune_and_evaluate
    symbols=(nnvm.sym.conv2d,))
  File "/home/wuweilin/tvm/python/tvm/autotvm/task/nnvm_integration.py", line 249, in extract_from_graph
    nnvm.compiler.build(graph, target=tracing_target, shape=shape, dtype=dtype)
  File "/home/wuweilin/tvm/nnvm/python/nnvm/compiler/build_module.py", line 305, in build
    graph = graph.apply("GraphCompile")
  File "/home/wuweilin/tvm/nnvm/python/nnvm/graph.py", line 234, in apply
    check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
  File "/home/wuweilin/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call
    raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: [15:49:40] /home/wuweilin/tvm/nnvm/include/nnvm/op.h:530: Check failed: op != nullptr 

Stack trace returned 10 entries:
[bt] (0) /home/wuweilin/tvm/build/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f6efcd8892b]
[bt] (1) /home/wuweilin/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f6efcd891d8]
[bt] (2) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::GetScheduleArgs(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::Array<tvm::Tensor, void>*)+0x2cf4) [0x7f6ecc5c11b4]
[bt] (3) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::DoLower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x108) [0x7f6ecc5c19a8]
[bt] (4) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::Lower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x1d5) [0x7f6ecc5c2e25]
[bt] (5) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::GraphLower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x5f) [0x7f6ecc5b7faf]
[bt] (6) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::GraphCompile(nnvm::Graph const&)+0xc65) [0x7f6ecc5d0d45]
[bt] (7) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(std::_Function_handler<nnvm::Graph (nnvm::Graph), nnvm::Graph (*)(nnvm::Graph const&)>::_M_invoke(std::_Any_data const&, nnvm::Graph&&)+0x20) [0x7f6ecc58b2f0]
[bt] (8) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::ApplyPasses(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x32b) [0x7f6ecc54ee5b]
[bt] (9) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(NNGraphApplyPasses+0x348) [0x7f6ecc52f878]

I am not sure if this is an actual bug, just FYI.

cc @merrymercy

This seems the same error as this one. Can you tell me more details so I can reproduce? What model?

Steps to reproduce:
Modify tutorials/autotvm/tune_nnvm_cuda.py, replace this line https://github.com/dmlc/tvm/blob/01ec533e7cfb603f6114a763f03dad4c564f589d/tutorials/autotvm/tune_nnvm_cuda.py#L210 with

with nnvm.compiler.build_config(opt_level=3):
        tasks = autotvm.task.extract_from_graph(net, target=target,
                                            shape={'data': input_shape}, dtype=dtype,
                                            symbols=(nnvm.sym.conv2d,))

Uncomment this https://github.com/dmlc/tvm/blob/01ec533e7cfb603f6114a763f03dad4c564f589d/tutorials/autotvm/tune_nnvm_cuda.py#L248 to run the script

I didn’t change other config. The default model in the script is a fp32 resnet-18

ok, I’ll have a look.

Why do you need to run extract_from_graph under opt_level=3?
Currently It is not expected to run under opt_level=3.

Current way to construct custom task (e.g. for different layout) is extracting normal tasks from a normal graph and modify it by something like

new_name = do_some_modification(old_task.name)
new_args = do_some_modification(old_task.args)
new_task = autotvm.task.create(new_name, new_args,
                               old_task.target, old_task.target_host, 'direct')

I have the similar issue when compiling model with certain dispatcher and schedules. I’m trying to apply autotvm to x86 and intel_graphics. For some schedules but not all, this issue will rise. I’ll give a reproducible example when my PR is ready.

The error is happening when trying to compile a fused op named

fuse___layout_transform___negative_elemwise_mul___layout_transform___elemwise_add_expand_dims_broadcast_add_relu

The strange thing is that this fusion never happens when invoking nnvm.compiler.build with opt-level 3 for standard targets (“llvm”, “cuda”). AutoTVM invokes nnvm.compiler.build with the target “llvm -device=tracing” this is causing weird fusion behavior.

The error itself is a consequence of my change in https://github.com/dmlc/tvm/pull/1608. But without that PR, different error occurs. It tries to fuse layout transform with max pool. Fusing injective op such as layout transform with other, non-broadcast op is not supported, and my PR was meant to prevent that from happening.

The fused op in that case is named

fuse_max_pool2d_expand_dims_broadcast_mul___layout_transform___negative_elemwise_mul___layout_transform___elemwise_add_expand_dims_broadcast_add_relu

and it doesn’t occur in the standard setting either.

I run it under opt_level=3 because otherwise in extract_from_graph, it runs the autotvm template I wrote with actual target when creating the task, which causes some error because AlterOpLayout is not applied.

But finally I managed to figure out what I should do, this is pretty tricky to me.
Previously I check dtype in conv2d and do the dispatch. So running without AlterOpLayout caused error. I guess I need to use a different template instead.

@masahi I have the similar issue when compiling x86 and Intel graphics models. Some schedule combinations will trigger this problem. If I roll back your PR, for intel graphics it will return Direct host side access to device memory error. Any idea how to resolve this issue?

I think your issue is the same one as this one.

The problem is some injective ops are being fused to conv op (or other non-broadcast op), but in the conv schedule, only broadcast ops are scheduled to be fused with conv op. This is due to the condition

if tag.is_broadcast(OP.tag):

that I think you are familiar with.

Injective ops that are not scheduled properly this way will be just given a default schedule (the output of create_schedule() ). For CPU backend this is fine ( it will be single-threaded), but for GPU backend, since all ops need to be bound to gpu threads, you get error which says “Direct host side access detected …”.

I don’t know which schedule in the intel graphics is causing your error, but if you replace is_broadcast with is_injective, it should work. But I was told in this PR that fusing injective ops with conv or other non-broadcast is not encouraged. My next PR that you rolled back is meant to prevent fusing injective ops with other ops, but apparently there are other edge cases that I failed to handle.

@masahi Thank you! It works now.

@masahi This issue still exists for x86. Even I changed if tag.is_broadcast(OP.tag) to if tag.is_injective(OP.tag), it retuned the same error. This fix works for Intel graphics.

you mean, after you reverted my PR and replaced tag.is_broradcast with is_injective, you are still getting this error?

nnvm/include/nnvm/op.h:530: Check failed: op != nullptr

And you are running auto-tuning under opt-level = 3 for x86? I think that’s reasonable for x86, if you want to tune for NCHWc layout convolution.

I have an updated version of autotvm.task.extract_from_graph. Different from the current way that creates a “llvm -device=tracing” target, the new method traces topi calls by monkey patch. Then the task extraction is exactly the same as normal compilation.

Can this help the current case?

yes, that should solve this issue. If it is the same as normal compilation, no such error should happen.

I’ll post another thread for x86 compilation issue. I think this is not related to autotvm, but graph fuse. Tuning is fine. I can successfully tune all conv2d_NCHWc. The problem is when loading schedules and compile, it gives error. And it only happens when schedule combination introduces layout transformations.

The PR https://github.com/dmlc/tvm/pull/1760 should fix this issue as well.