Disable TVM assumption of same code for different Relay nodes?

Problem
Generated code for my backend is dependent on the values within weight & bias tensors. This appears to be an issue at both runtime and compile time because GetFunction will query for “func_name” and will map distinctly different operators to the same code implementation.

Example
My simple sequential network consists of 512 unit dense layers with bias + act. A few of the 512 unit dense layers all query the same code for “fused_nn_dense_add_sigmoid_kernel0”. Because weight + bias values for each op need to change the actual code that is generated AOT, inference returns an incorrect result every time.

Question
Is there any way to disable TVM’s assumption that op nodes share the same code?

In my opinion, the only correct time to apply this optimization is when code has already been generated for each op and can be 1:1 mapped with another op.

If it depends on the value, should they be part of constant? This way different constant will result in different programs. After we have a better packaging that packages the constant along the so, we might be able to resolve this issue better

To be more clear, the static weight & bias tensors might be analyzed AOT for values that will aid in quantization (fixed point location for int8 quantization for example). In my case, accuracy of the network depends on these values for each op. This means that Dense(unit=512) ops with different W matrices should probably not share the same code since network accuracy may be affected to an unknown extent.

A talk at the TVM conference made me wonder this related question

I think what @tqchen meant was the same op with different constant values should not share the same generated function code. Based on my understanding, codegen will generate different functions with different names such as fused_nn_dense_add_sigmoid_kernel_0, fused_nn_dense_add_sigmoid_kernel_1, etc. If like you mentioned your ops have different constants but codegen still uses the same function for all of them, then it seems like a bug to be fixed.

@comaniac I see. From printing the LoweredFunc f->name in CodeGen stage it appears that (1) CodeGen only generates code once per op type (dense layers of equal size for example) even when the constant args (W matrix & bias) differs and (2) the graph_json does not contain func_names that are seen in the CodeGen stage (ending in kernel_0, etc.). Will look into it further, we’re adding a backend so maybe we missed something.

@comaniac @tqchen I think this may be a bug. I’m not too familiar with this part of the code but it looks like compile_engine.cc L682-686 will find that cached_func is .defined() for the first Dense(unit=512) op and returns this same implementation for the rest of the Dense(unit=512) ops.

    auto it = cache_.find(key);
    if (it != cache_.end()) {
      it->second->use_count += 1;
      if (it->second->cached_func.defined()) return it->second;
      value = it->second;
    }

I would expect to lower for each Dense(unit=512) op since constants differ between all of them.
When I do relay.build I see that graph_runtime_codegen.cc L391-L393 (appears to) finds the same Function when it does VisitExpr_ for each individual CallNode

    } else if (op->op.as<FunctionNode>()) {
      func = GetRef<Function>(op->op.as<FunctionNode>());
    } else {

So it ends up that all the Dense(unit=512) layer share the same func_name and lowered code

I changed std::unordered_map<const CCacheKey, CCacheValue> cache_; to std::unordered_map<const CCacheKey*, CCacheValue> cache_; and now I have CodeGen for every op as I would expect. Probably just a hacky work around :slight_smile:.

To get heterogeneous execution with my own backend I have to change module.cc’s GetFunction to search imported modules first. I also have to concatenate _kernel0 to func_names passed to GetFunction in my custom Runtime module since CodeGen appends this to func_names for each generated code.