Relay: compile_engine.cc:220: Check failed: [...] Two complicated op in a primitive function


#1

I’m trying to compile inception v3 using the relay compilation engine, and I’m running into this issue:

compile_engine.cc:220: Check failed: !master_op_.defined() || master_op_pattern_ < kCommReduce: Two complicated op in a primitive function  master=Op(nn.conv2d) current=Op(nn.conv2d)

The code looks all nice, but there are calls to nn.conv2d, nn.relu and a few other seemingly “non-primitive” operators:

def @main(%data: Tensor[(1, 3, 299, 299), float32], ...) {
  %0 = nn.conv2d(%data, %conv_conv1_weight, strides=[2, 2], channels=32, kernel_size=[3, 3]) /* ty=Tensor[(1, 32, 149, 149), float32] */;
  %1 = add(%conv_bn_moving_var, 2e-05f /* ty=float32 */) /* ty=Tensor[(32), float32] */;
  %2 = sqrt(%1) /* ty=Tensor[(32), float32] */;
  %3 = divide(1f /* ty=float32 */, %2) /* ty=Tensor[(32), float32] */;
  ...

What is going wrong here? How should the nn.conv2ds (and the other nn._) be handled? Should their implementations have been picked up from somewhere by this point?

This is my code (based on some relay test):

import tvm
from tvm import relay
import tvm.relay.testing


v3, params = tvm.relay.testing.inception_v3.get_workload(batch_size=1)

# Apply SimplifyInference to get rid of nn.batch_norm.
seq = tvm.relay.transform.Sequential([tvm.relay.transform.SimplifyInference()])
x = seq(v3)
print(x)

engine = relay.backend.compile_engine.get()
engine.lower(x["main"], "llvm")

#2

Try the latest TVM. I could not reproduce this problem using the latest TVM.


#3

I just updated the master branch and reran the testcase (I commented out the call to print):

$ env PYTHONPATH=/w/src/dmlc/tvm/python:/w/src/dmlc/tvm/topi/python:/w/src/dmlc/tvm/nnvm/python python3 test_relay.py

Cannot find config for target=llvm, workload=('conv2d', (1, 3, 299, 299, 'float32'), (32, 3, 3, 3, 'float32'), (2, 2), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Cannot find config for target=llvm, workload=('conv2d', (1, 32, 149, 149, 'float32'), (32, 32, 3, 3, 'float32'), (1, 1), (0, 0), (1, 1), 'NCHW', 'float32'). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):

  File "/w/src/dmlc/tvm/python/tvm/relay/backend/compile_engine.py", line 92, in lower
    return _backend._CompileEngineLower(self, key)

  File "/w/src/dmlc/tvm/python/tvm/_ffi/_ctypes/function.py", line 210, in __call__
    raise get_last_ffi_error()

tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::IRFunctor<tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*) const+0x115) [0x7f63f98010a5]
  [bt] (7) /w/src/dmlc/tvm/build.x86/libtvm.so(std::__1::__function::__func<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*), std::__1::allocator<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>, tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*&&)+0x1b) [0x7f63f9803b6b]
  [bt] (6) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x1e0) [0x7f63f97feed0]
  [bt] (5) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr(tvm::relay::Expr const&)+0x60) [0x7f63f97fe010]
  [bt] (4) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::VisitExpr(tvm::relay::Expr const&)+0x86) [0x7f63f9800846]
  [bt] (3) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::IRFunctor<tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*) const+0x115) [0x7f63f98010a5]
  [bt] (2) /w/src/dmlc/tvm/build.x86/libtvm.so(std::__1::__function::__func<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*), std::__1::allocator<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>, tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*&&)+0x1b) [0x7f63f9803b6b]
  [bt] (1) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x923) [0x7f63f97ff613]
  [bt] (0) /w/src/dmlc/tvm/build.x86/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x37) [0x7f63f941fd17]
  File "/w/src/dmlc/tvm/src/relay/backend/compile_engine.cc", line 217
TVMError: Check failed: !master_op_.defined() || master_op_pattern_ < kCommReduce: Two complicated op in a primitive function  master=Op(nn.conv2d) current=Op(nn.conv2d)


During handling of the above exception, another exception occurred:


Traceback (most recent call last):

  File "test_relay.py", line 14, in <module>
    engine.lower(x["main"], "llvm")

  File "/w/src/dmlc/tvm/python/tvm/relay/backend/compile_engine.py", line 100, in lower
    raise RuntimeError(msg)

RuntimeError: Traceback (most recent call last):
  File "/w/src/dmlc/tvm/python/tvm/relay/backend/compile_engine.py", line 92, in lower
    return _backend._CompileEngineLower(self, key)
  File "/w/src/dmlc/tvm/python/tvm/_ffi/_ctypes/function.py", line 210, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::IRFunctor<tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*) const+0x115) [0x7f63f98010a5]
  [bt] (7) /w/src/dmlc/tvm/build.x86/libtvm.so(std::__1::__function::__func<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*), std::__1::allocator<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>, tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*&&)+0x1b) [0x7f63f9803b6b]
  [bt] (6) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x1e0) [0x7f63f97feed0]
  [bt] (5) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr(tvm::relay::Expr const&)+0x60) [0x7f63f97fe010]
  [bt] (4) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::VisitExpr(tvm::relay::Expr const&)+0x86) [0x7f63f9800846]
  [bt] (3) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::IRFunctor<tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*) const+0x115) [0x7f63f98010a5]
  [bt] (2) /w/src/dmlc/tvm/build.x86/libtvm.so(std::__1::__function::__func<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*), std::__1::allocator<tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>::InitVTable()::'lambda4'(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>, tvm::Array<tvm::Tensor, void> (tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*)>::operator()(tvm::NodeRef const&, tvm::relay::ExprFunctor<tvm::Array<tvm::Tensor, void> (tvm::relay::Expr const&)>*&&)+0x1b) [0x7f63f9803b6b]
  [bt] (1) /w/src/dmlc/tvm/build.x86/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x923) [0x7f63f97ff613]
  [bt] (0) /w/src/dmlc/tvm/build.x86/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x37) [0x7f63f941fd17]
  File "/w/src/dmlc/tvm/src/relay/backend/compile_engine.cc", line 217
TVMError: Check failed: !master_op_.defined() || master_op_pattern_ < kCommReduce: Two complicated op in a primitive function  master=Op(nn.conv2d) current=Op(nn.conv2d)
Error during compile func
[...]

#4

Have you invoked FuseOps pass before calling compile engine?


#5

I test it on macOS and Linux, both can not reproduce it. @vinx13 could you have time to try it double check?


#6

I can reproduce it on Linux with master


#7

I clean the repo and pull again and could reproduce it now.


#8

The minimal reproduce test case:

n, c, h, w = 1, 10, 224, 224
x = relay.var("x", relay.ty.TensorType((n, c, h, w), "float32"))
w = relay.var("w")
w1 = relay.var("w1")
y = relay.nn.conv2d(x, w,
                    kernel_size=(3, 3),
                    padding=(0, 0),
                    channels=2)
y = relay.nn.conv2d(y, w1,
                    kernel_size=(3, 3),
                    padding=(0, 0),
                    channels=2)
y = relay.Function([x, w, w1], y)
y = relay.Module.from_expr(y)

engine = relay.backend.compile_engine.get()
engine.lower(y["main"], "llvm")

When we have two conv2d, we will meet the same error.


#9

Let me summary the investigation result of today afternoon. Sorry for it is a little long.

Let’s see my minimum reproduce test case.

n, c, h, w = 1, 10, 224, 224
x = relay.var("x", relay.ty.TensorType((n, c, h, w), "float32"))
w = relay.var("w")
w1 = relay.var("w1")
y = relay.nn.conv2d(x, w,
                    kernel_size=(3, 3),
                    padding=(0, 0),
                    channels=2)
y = relay.nn.conv2d(y, w1,
                    kernel_size=(3, 3),
                    padding=(0, 0),
                    channels=2)
y = relay.Function([x, w, w1], y)
y = relay.Module.from_expr(y)
seq = tvm.relay.transform.Sequential([tvm.relay.transform.SimplifyInference(), tvm.relay.transform.FuseOps()])
y = seq(y)

engine = relay.backend.compile_engine.get()
engine.lower(y["main"], "llvm")

I add fuse op pass, because if we don’t add it we can not compare the relay.build next. Which will requires we should add fuse op in graph_runtime_codegen.cc

    if (op->op.as<OpNode>()) {
      LOG(FATAL) << "Operators should be transformed away; try applying"
                 << "the fuse_ops transformation to the expression.";
    }

If we print the function body of it, it will be like this:

CallNode(FunctionNode([Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], TensorType([1, 2, 220, 220], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x2501c70), [TensorType([1, 2, 222, 222], float32), TensorType([2, 2, 3, 3], float32)]), [], {“Primitive”: 1}), [CallNode(FunctionNode([Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], TensorType([1, 2, 222, 222], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x25015b0), [TensorType([1, 10, 224, 224], float32), TensorType([2, 10, 3, 3], float32)]), [], {“Primitive”: 1}), [Var(x, ty=TensorType([1, 10, 224, 224], float32)), Var(w, ty=TensorType([2, 10, 3, 3], float32))], (nullptr), []), Var(w1, ty=TensorType([2, 2, 3, 3], float32))], (nullptr), [])

We could see our function contains two CallNode of convolution.

So, what will be happened when we call engine.lower? It will parse this function and find two conv2d in it.

    if (op_pattern >= kCommReduce) {
      CHECK(!master_op_.defined() || master_op_pattern_ < kCommReduce)
          << "Two complicated op in a primitive function "
          << " master=" << master_op_ << " current=" << op;
    }
    if (op_pattern >= master_op_pattern_) {
      master_op_ = op;
      master_attrs_ = call_node->attrs;
      master_op_pattern_ = op_pattern;
    }

So, the first time, master_op_pattern_ is just the initial value 0 and then assigned be 4 (kOutEWiseFusable). Then we meet the second conv2d in this function, we will meet the error:

master_op_pattern_ < kCommReduce

Because 4 not less than 3.

However, if we use

graph, lib, params = relay.build(y, 'llvm')

We will find we won’t meet this error. Why? Because the mechanism is not the same. relay.build will call this function during execution:

LoweredOutput Codegen(relay::Function func)

And will call

heads_ = VisitExpr(func->body);

Then it will also call _CompileEngineLower. But the function is not the same as our previous. Because it will do like this:

    Function func;
    if (op->op.as<OpNode>()) {
      LOG(FATAL) << "Operators should be transformed away; try applying"
                 << "the fuse_ops transformation to the expression.";
    } else if (op->op.as<GlobalVarNode>()) {
      LOG(FATAL) << "Not implemented";
    } else if (op->op.as<FunctionNode>()) {
      func = GetRef<Function>(op->op.as<FunctionNode>());
    } else {
      LOG(FATAL) << "TVM runtime does not support calls to " << op->op->type_key();
    }
    if (!func->IsPrimitive()) {
      LOG(FATAL) << "TVM only support calls to primitive functions "
                 << "(i.e functions composed of fusable operator invocations)";
    }

if we print the func, we will get like this

op FunctionNode([Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], TensorType([1, 2, 220, 220], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x357b270), [TensorType([1, 2, 222, 222], float32), TensorType([2, 2, 3, 3], float32)]), [], {"Primitive": 1})

op FunctionNode([Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], TensorType([1, 2, 222, 222], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x2655e50), [TensorType([1, 10, 224, 224], float32), TensorType([2, 10, 3, 3], float32)]), [], {"Primitive": 1})

That is to say, when we enter into _CompileEngineLower, we will just one convolution by one convolution function, not one big function contains two convolution.

Wish my investigation result could help you. I couldn’t make sure whether engine.lower(y["main"], "llvm") the mechanism is designed like this and encourage we use relay.build, so I couldn’t say this is one bug.


#10

Thank you! This helps me a lot! I’m getting started with relay and I just tried to compile something based on a testcase I saw. Using relay.build is perfectly fine for me.