Build module error when using ApplyGraphBest dispatch context on module with one relay operator with two compute & schedule implementations

I’m working on a tensor operator that has multiple compute and schedule implementations. To that end I registered a new relay operator according to here and then I use an operator strategy to register each new compute & schedule implementation based on operator attributes, according to here.

Autotvm works fine, both kernel and graph tuning produce credible results, but there is a problem that manifests when I use ApplyGraphBest on a module that contains two instantiated operators in cascade (tensor_input -> op0 -> op1 -> tensor_output), with different compute & schedule implementations. The code fails in the dispatch context, but it seems to be triggered from higher up in module lowering call:

Traceback (most recent call last):
  File "test_my_operator.py", line 169, in build_func
    graph, lib, module_params = relay.build(mod, target=target, params=params)
  File "/tvm/relay/build_module.py", line 252, in build
    graph_json, mod, params = bld_mod.build(mod, target, target_host, params)
  File "/tvm/relay/build_module.py", line 121, in build
    self._build(mod, target, target_host)
  File "/tvm/_ffi/_ctypes/packed_func.py", line 225, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  [bt] (8) /tvm/build/libtvm.so(tvm::relay::backend::GraphRuntimeCodegen::VisitExpr_(tvm::relay::CallNode const*)+0xec2) [0x7f6513756c72]
  [bt] (7) /tvm/build/libtvm.so(+0xc9fa8d) [0x7f6513729a8d]
  [bt] (6) /tvm/build/libtvm.so(tvm::relay::CompileEngineImpl::LowerInternal(tvm::relay::CCacheKey const&)+0x78c) [0x7f651373946c]
  [bt] (5) /tvm/build/libtvm.so(tvm::relay::ScheduleGetter::Create(tvm::relay::Function const&)+0x574) [0x7f65137319e4]
  [bt] (4) /tvm/build/libtvm.so(tvm::relay::backend::MemoizedExprTranslator<tvm::runtime::Array<tvm::te::Tensor, void> >::VisitExpr(tvm::RelayExpr const&)+0xb3) [0x7f6513735493]
  [bt] (3) /tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::VisitExpr(tvm::RelayExpr const&)+0x91) [0x7f6513733a61]
  [bt] (2) /tvm/build/libtvm.so(tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>::InitVTable()::{lambda(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)#6}::_FUN(tvm::runtime::ObjectRef const&, tvm::relay::ExprFunctor<tvm::runtime::Array<tvm::te::Tensor, void> (tvm::RelayExpr const&)>*)+0x27) [0x7f651372a407]
  [bt] (1) /tvm/build/libtvm.so(tvm::relay::ScheduleGetter::VisitExpr_(tvm::relay::CallNode const*)+0x546) [0x7f65137305d6]
  [bt] (0) /tvm/build/libtvm.so(+0xe3019b) [0x7f65138ba19b]
  File "/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "/tvm/relay/backend/compile_engine.py", line 257, in lower_call
    op, call.attrs, inputs, ret_type, target)
  File "/tvm/relay/backend/compile_engine.py", line 202, in select_implementation
    outs = impl.compute(attrs, inputs, out_type)
  File "/tvm/relay/op/op.py", line 89, in compute
    return _OpImplementationCompute(self, attrs, inputs, out_type)
  File "/tvm/_ffi/_ctypes/packed_func.py", line 225, in __call__
    raise get_last_ffi_error()
  [bt] (3) /tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f65138be495]
  [bt] (2) /tvm/build/libtvm.so(+0xd64828) [0x7f65137ee828]
  [bt] (1) /tvm/build/libtvm.so(tvm::relay::OpImplementation::Compute(tvm::Attrs const&, tvm::runtime::Array<tvm::te::Tensor, void> const&, tvm::Type const&)+0xb1) [0x7f65137ee5f1]
  [bt] (0) /tvm/build/libtvm.so(+0xe3019b) [0x7f65138ba19b]
  File "/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
    rv = local_pyfunc(*pyargs)
  File "my_operator.py", line 87, in compute_impl
    return [compute_fn(data)]
  File "/tvm/autotvm/task/topi_integration.py", line 155, in wrapper
    cfg = DispatchContext.current.query(tgt, workload)
  File "/tvm/autotvm/task/dispatcher.py", line 72, in query
    ret = self._query_inside(target, workload)
  File "/tvm/autotvm/task/dispatcher.py", line 419, in _query_inside
    assert wkl == workload
TVMError: AssertionError

The graph tuning log file has the entries in the right order (op0, then op1), but for some reason when dispatch context ApplyGraphBest query_inside(…) is called the first time, it is called with the workload of op1 (instead of None) and this triggers the workload mismatch error because the dispatch first points to the workload of op0.

Any ideas where this might be coming from? This isn’t caused by old / stale entries in the graph log file since I remove tuning logs before every run. Could it be a cached entry somewhere in the vicinity of the compile engine or the module builder which is wrongly attached to op1 instead of op0 or vice versa?