Hi, I’m working on an example of translating fused conv + bias add + relu ops (which come from conv + bn + relu after FoldScaleAxis + FoldConstant) to dnnl backend.
I’ve modified dnnl/codgen.cc to handle fused ops and currently I can emit code correctly.
extern "C" void dnnl_0_(float* dnnl_input0, float* dnnl_input1, float* bias, float* out) {
float* buf_0 = (float*)std::malloc(4 * 802816);
dnnl_fused_conv2d_bias_relu(dnnl_input0, dnnl_input1, bias, buf_0, 1, 3, 224, 224, 16, 1, 1, 1, 3, 3, 1, 1);
std::memcpy(out, buf_0, 4 * 802816);
std::free(buf_0);
}
extern "C" int dnnl_0_wrapper_(DLTensor* arg0,
DLTensor* arg1,
DLTensor* arg2,
DLTensor* arg3) {
dnnl_0_(static_cast<float*>(arg0->data),
static_cast<float*>(arg1->data),
static_cast<float*>(arg2->data),
static_cast<float*>(arg3->data));
return 0;
}
Here, the “bias” parameter comes from the bias add op which originally follows the conv op but now fused.
So I’m generating a new signature inside codegen to handle fused ops, but the problem is that the runtime doesn’t know about this new signature. So when I try to run the generated function I get the following error:
TVMError: Check failed: ret == 0 (-1 vs. 0) : [23:36:07] /home/masa/projects/dev/tvm/include/tvm/runtime/packed_func.h:1107:
Check failed: i < num_args (3 vs. 3) : not enough argument passed, 3 passed but request arg[3].
I think I need to modify the runtime code (both vm and graph). How should I go about this? Or is there another way to handle fused ops that doesn’t require runtime change? @zhiics @comaniac