[AutoTVM][NNVM] error when extract_task_from_graph with opt_level=3

vinx13 · September 13, 2018, 9:05am

Some error occurs when I run tune_nnvm_cuda.py, with the whole script under opt level = 3

Extract tasks...
Traceback (most recent call last):
  File "tune_nnvm_cuda.py", line 248, in <module>
    tune_and_evaluate(tuning_option)
  File "tune_nnvm_cuda.py", line 212, in tune_and_evaluate
    symbols=(nnvm.sym.conv2d,))
  File "/home/wuweilin/tvm/python/tvm/autotvm/task/nnvm_integration.py", line 249, in extract_from_graph
    nnvm.compiler.build(graph, target=tracing_target, shape=shape, dtype=dtype)
  File "/home/wuweilin/tvm/nnvm/python/nnvm/compiler/build_module.py", line 305, in build
    graph = graph.apply("GraphCompile")
  File "/home/wuweilin/tvm/nnvm/python/nnvm/graph.py", line 234, in apply
    check_call(_LIB.NNGraphApplyPasses(self.handle, npass, cpass, ctypes.byref(ghandle)))
  File "/home/wuweilin/tvm/nnvm/python/nnvm/_base.py", line 75, in check_call
    raise NNVMError(py_str(_LIB.NNGetLastError()))
nnvm._base.NNVMError: [15:49:40] /home/wuweilin/tvm/nnvm/include/nnvm/op.h:530: Check failed: op != nullptr 

Stack trace returned 10 entries:
[bt] (0) /home/wuweilin/tvm/build/libtvm.so(dmlc::StackTrace[abi:cxx11]()+0x5b) [0x7f6efcd8892b]
[bt] (1) /home/wuweilin/tvm/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x28) [0x7f6efcd891d8]
[bt] (2) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::GetScheduleArgs(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >*, tvm::Array<tvm::Tensor, void>*)+0x2cf4) [0x7f6ecc5c11b4]
[bt] (3) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::DoLower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x108) [0x7f6ecc5c19a8]
[bt] (4) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::CompileEngine::Lower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x1d5) [0x7f6ecc5c2e25]
[bt] (5) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::GraphLower(nnvm::Graph, tvm::Array<tvm::Tensor, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, int)+0x5f) [0x7f6ecc5b7faf]
[bt] (6) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::compiler::GraphCompile(nnvm::Graph const&)+0xc65) [0x7f6ecc5d0d45]
[bt] (7) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(std::_Function_handler<nnvm::Graph (nnvm::Graph), nnvm::Graph (*)(nnvm::Graph const&)>::_M_invoke(std::_Any_data const&, nnvm::Graph&&)+0x20) [0x7f6ecc58b2f0]
[bt] (8) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(nnvm::ApplyPasses(nnvm::Graph, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&)+0x32b) [0x7f6ecc54ee5b]
[bt] (9) /home/wuweilin/tvm/nnvm/python/nnvm/../../../build/libnnvm_compiler.so(NNGraphApplyPasses+0x348) [0x7f6ecc52f878]

I am not sure if this is an actual bug, just FYI.

cc @merrymercy

masahi · September 13, 2018, 9:27am

This seems the same error as this one. Can you tell me more details so I can reproduce? What model?

vinx13 · September 13, 2018, 9:34am

Steps to reproduce:
Modify tutorials/autotvm/tune_nnvm_cuda.py, replace this line https://github.com/dmlc/tvm/blob/01ec533e7cfb603f6114a763f03dad4c564f589d/tutorials/autotvm/tune_nnvm_cuda.py#L210 with

with nnvm.compiler.build_config(opt_level=3):
        tasks = autotvm.task.extract_from_graph(net, target=target,
                                            shape={'data': input_shape}, dtype=dtype,
                                            symbols=(nnvm.sym.conv2d,))

Uncomment this https://github.com/dmlc/tvm/blob/01ec533e7cfb603f6114a763f03dad4c564f589d/tutorials/autotvm/tune_nnvm_cuda.py#L248 to run the script

I didn’t change other config. The default model in the script is a fp32 resnet-18

masahi · September 13, 2018, 9:45am

ok, I’ll have a look.

merrymercy · September 13, 2018, 3:55pm

Why do you need to run extract_from_graph under opt_level=3?
Currently It is not expected to run under opt_level=3.

Current way to construct custom task (e.g. for different layout) is extracting normal tasks from a normal graph and modify it by something like

new_name = do_some_modification(old_task.name)
new_args = do_some_modification(old_task.args)
new_task = autotvm.task.create(new_name, new_args,
                               old_task.target, old_task.target_host, 'direct')

kevinthesun · September 13, 2018, 11:48pm

I have the similar issue when compiling model with certain dispatcher and schedules. I’m trying to apply autotvm to x86 and intel_graphics. For some schedules but not all, this issue will rise. I’ll give a reproducible example when my PR is ready.

masahi · September 14, 2018, 12:55am

The error is happening when trying to compile a fused op named

fuse___layout_transform___negative_elemwise_mul___layout_transform___elemwise_add_expand_dims_broadcast_add_relu

The strange thing is that this fusion never happens when invoking nnvm.compiler.build with opt-level 3 for standard targets (“llvm”, “cuda”). AutoTVM invokes nnvm.compiler.build with the target “llvm -device=tracing” this is causing weird fusion behavior.

The error itself is a consequence of my change in https://github.com/dmlc/tvm/pull/1608. But without that PR, different error occurs. It tries to fuse layout transform with max pool. Fusing injective op such as layout transform with other, non-broadcast op is not supported, and my PR was meant to prevent that from happening.

The fused op in that case is named

fuse_max_pool2d_expand_dims_broadcast_mul___layout_transform___negative_elemwise_mul___layout_transform___elemwise_add_expand_dims_broadcast_add_relu

and it doesn’t occur in the standard setting either.

vinx13 · September 14, 2018, 3:32am

I run it under opt_level=3 because otherwise in extract_from_graph, it runs the autotvm template I wrote with actual target when creating the task, which causes some error because AlterOpLayout is not applied.

But finally I managed to figure out what I should do, this is pretty tricky to me.
Previously I check dtype in conv2d and do the dispatch. So running without AlterOpLayout caused error. I guess I need to use a different template instead.

kevinthesun · September 19, 2018, 12:30am

@masahi I have the similar issue when compiling x86 and Intel graphics models. Some schedule combinations will trigger this problem. If I roll back your PR, for intel graphics it will return Direct host side access to device memory error. Any idea how to resolve this issue?

masahi · September 19, 2018, 1:01am

I think your issue is the same one as this one.

The problem is some injective ops are being fused to conv op (or other non-broadcast op), but in the conv schedule, only broadcast ops are scheduled to be fused with conv op. This is due to the condition

if tag.is_broadcast(OP.tag):

that I think you are familiar with.

Injective ops that are not scheduled properly this way will be just given a default schedule (the output of create_schedule() ). For CPU backend this is fine ( it will be single-threaded), but for GPU backend, since all ops need to be bound to gpu threads, you get error which says “Direct host side access detected …”.

I don’t know which schedule in the intel graphics is causing your error, but if you replace is_broadcast with is_injective, it should work. But I was told in this PR that fusing injective ops with conv or other non-broadcast is not encouraged. My next PR that you rolled back is meant to prevent fusing injective ops with other ops, but apparently there are other edge cases that I failed to handle.

kevinthesun · September 19, 2018, 1:10am

@masahi Thank you! It works now.

kevinthesun · September 20, 2018, 6:39pm

@masahi This issue still exists for x86. Even I changed if tag.is_broadcast(OP.tag) to if tag.is_injective(OP.tag), it retuned the same error. This fix works for Intel graphics.

masahi · September 20, 2018, 11:21pm

you mean, after you reverted my PR and replaced tag.is_broradcast with is_injective, you are still getting this error?

nnvm/include/nnvm/op.h:530: Check failed: op != nullptr

And you are running auto-tuning under opt-level = 3 for x86? I think that’s reasonable for x86, if you want to tune for NCHWc layout convolution.

merrymercy · September 21, 2018, 1:14am

I have an updated version of autotvm.task.extract_from_graph. Different from the current way that creates a “llvm -device=tracing” target, the new method traces topi calls by monkey patch. Then the task extraction is exactly the same as normal compilation.

Can this help the current case?

masahi · September 21, 2018, 1:39am

yes, that should solve this issue. If it is the same as normal compilation, no such error should happen.

kevinthesun · September 21, 2018, 8:49pm

I’ll post another thread for x86 compilation issue. I think this is not related to autotvm, but graph fuse. Tuning is fine. I can successfully tune all conv2d_NCHWc. The problem is when loading schedules and compile, it gives error. And it only happens when schedule combination introduces layout transformations.

masahi · September 22, 2018, 2:22pm

The PR https://github.com/dmlc/tvm/pull/1760 should fix this issue as well.