Let me summary the investigation result of today afternoon. Sorry for it is a little long.
Let’s see my minimum reproduce test case.
n, c, h, w = 1, 10, 224, 224
x = relay.var("x", relay.ty.TensorType((n, c, h, w), "float32"))
w = relay.var("w")
w1 = relay.var("w1")
y = relay.nn.conv2d(x, w,
kernel_size=(3, 3),
padding=(0, 0),
channels=2)
y = relay.nn.conv2d(y, w1,
kernel_size=(3, 3),
padding=(0, 0),
channels=2)
y = relay.Function([x, w, w1], y)
y = relay.Module.from_expr(y)
seq = tvm.relay.transform.Sequential([tvm.relay.transform.SimplifyInference(), tvm.relay.transform.FuseOps()])
y = seq(y)
engine = relay.backend.compile_engine.get()
engine.lower(y["main"], "llvm")
I add fuse op pass, because if we don’t add it we can not compare the relay.build
next. Which will requires we should add fuse op
in graph_runtime_codegen.cc
if (op->op.as<OpNode>()) {
LOG(FATAL) << "Operators should be transformed away; try applying"
<< "the fuse_ops transformation to the expression.";
}
If we print the function body of it, it will be like this:
CallNode(FunctionNode([Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], TensorType([1, 2, 220, 220], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x2501c70), [TensorType([1, 2, 222, 222], float32), TensorType([2, 2, 3, 3], float32)]), [], {“Primitive”: 1}), [CallNode(FunctionNode([Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], TensorType([1, 2, 222, 222], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x25015b0), [TensorType([1, 10, 224, 224], float32), TensorType([2, 10, 3, 3], float32)]), [], {“Primitive”: 1}), [Var(x, ty=TensorType([1, 10, 224, 224], float32)), Var(w, ty=TensorType([2, 10, 3, 3], float32))], (nullptr), []), Var(w1, ty=TensorType([2, 2, 3, 3], float32))], (nullptr), [])
We could see our function contains two CallNode of convolution.
So, what will be happened when we call engine.lower
? It will parse this function and find two conv2d
in it.
if (op_pattern >= kCommReduce) {
CHECK(!master_op_.defined() || master_op_pattern_ < kCommReduce)
<< "Two complicated op in a primitive function "
<< " master=" << master_op_ << " current=" << op;
}
if (op_pattern >= master_op_pattern_) {
master_op_ = op;
master_attrs_ = call_node->attrs;
master_op_pattern_ = op_pattern;
}
So, the first time, master_op_pattern_ is just the initial value 0 and then assigned be 4 (kOutEWiseFusable). Then we meet the second conv2d in this function, we will meet the error:
master_op_pattern_ < kCommReduce
Because 4 not less than 3.
However, if we use
graph, lib, params = relay.build(y, 'llvm')
We will find we won’t meet this error. Why? Because the mechanism is not the same. relay.build
will call this function during execution:
LoweredOutput Codegen(relay::Function func)
And will call
heads_ = VisitExpr(func->body);
Then it will also call _CompileEngineLower
. But the function is not the same as our previous. Because it will do like this:
Function func;
if (op->op.as<OpNode>()) {
LOG(FATAL) << "Operators should be transformed away; try applying"
<< "the fuse_ops transformation to the expression.";
} else if (op->op.as<GlobalVarNode>()) {
LOG(FATAL) << "Not implemented";
} else if (op->op.as<FunctionNode>()) {
func = GetRef<Function>(op->op.as<FunctionNode>());
} else {
LOG(FATAL) << "TVM runtime does not support calls to " << op->op->type_key();
}
if (!func->IsPrimitive()) {
LOG(FATAL) << "TVM only support calls to primitive functions "
<< "(i.e functions composed of fusable operator invocations)";
}
if we print the func, we will get like this
op FunctionNode([Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], TensorType([1, 2, 220, 220], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 2, 222, 222], float32)), Var(p1, ty=TensorType([2, 2, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x357b270), [TensorType([1, 2, 222, 222], float32), TensorType([2, 2, 3, 3], float32)]), [], {"Primitive": 1})
op FunctionNode([Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], TensorType([1, 2, 222, 222], float32), CallNode(Op(nn.conv2d), [Var(p0, ty=TensorType([1, 10, 224, 224], float32)), Var(p1, ty=TensorType([2, 10, 3, 3], float32))], relay.attrs.Conv2DAttrs(0x2655e50), [TensorType([1, 10, 224, 224], float32), TensorType([2, 10, 3, 3], float32)]), [], {"Primitive": 1})
That is to say, when we enter into _CompileEngineLower
, we will just one convolution by one convolution function, not one big function contains two convolution.
Wish my investigation result could help you. I couldn’t make sure whether engine.lower(y["main"], "llvm")
the mechanism is designed like this and encourage we use relay.build
, so I couldn’t say this is one bug.