Relationship between tvm.build() and relay.build()

I am new to TVM and the tutorial is not systematic.

I found that tvm.build() requires a schedule argument while relay.build() not.

  • What is the relationship between these two functions?
  • Is there a way to build a Tensorflow model with tvm.build()?
  • As tvm.build() has an argument binds, can I build a Tensorflow model with custom binds or an alternative way?
  • I want to modify the layout of op/data of a pretrained Tensorflow model, is “adding a custom pass to Relay” the only way?

Thanks a lot if anyone can help me…

1 Like

tvm.build is for individual op (or a few fused ops), while relay.build is to build the whole model, which will call into tvm.build. So to build a TF model, you need relay.build

2 Likes

I’m trying to understand the relationship between relay.build() and tvm.build().

@vinx13, thanks for you reply. Does this mean a neural-network import from certain framework, say MXNET (e.g. this tutorial), can only use relay.build() to do full-model compilation? Is there a way to optimize further using tvm.build() or it is actually doing that under the hood? It is appreciated anyone can specify the code there. Thanks.

shape_dict = {'data': x.shape}
mod, params = relay.frontend.from_mxnet(block, shape_dict)
## we want a probability so add a softmax operator
func = mod["main"]

func = relay.Function(func.params, relay.nn.softmax(func.body), None, func.type_params, func.attrs)
target = 'llvm'
with relay.build_config(opt_level=3):
    graph, lib, params = relay.build(func, target, params=params)

##########################
# Anything we can do here to further optimize individual operators/steps 
#   in the imported resnet using tvm.build()?
########################

...
from tvm.contrib import graph_runtime
ctx = tvm.gpu(0)
dtype = 'float32'
m = graph_runtime.create(graph, lib, ctx)
# set inputs
m.set_input('data', tvm.nd.array(x.astype(dtype)))
m.set_input(**params)
# execute
m.run()
# get outputs
tvm_output = m.get_output(0)
top1 = np.argmax(tvm_output.asnumpy()[0])
print('TVM prediction top-1:', top1, synset[top1])

It seems to me relay.build() and tvm.build() are processed through completely different paths in TVM source repo. It is appreciated anyone can help correct me or confirm. Thanks in advance.

Relay.build() call in python is quickly passed on into C++ side, mainly processed within src/relay/backend/build_module.cc.

  /*!
   * \brief Compile a Relay IR module to runtime module.
   *
   * \param relay_module The Relay IR module.
   * \param params The parameters.
   */
  void BuildRelay(
      IRModule relay_module,
      const std::unordered_map<std::string, tvm::runtime::NDArray>& params) {
    // Relay IRModule -> IRModule optimizations.
    relay_module = Optimize(relay_module, targets_, params);  // <== **Various graph-level optimizations**.
    // Get the updated function.
    auto func = Downcast<Function>(relay_module->Lookup("main"));

    // Generate code for the updated function.
    graph_codegen_ = std::unique_ptr<GraphCodegen>(new GraphCodegen());
    graph_codegen_->Init(nullptr, targets_);
    graph_codegen_->Codegen(func);

    ret_.graph_json = graph_codegen_->GetJSON();
    ret_.params = graph_codegen_->GetParams();

    auto lowered_funcs = graph_codegen_->GetLoweredFunc();
    if (lowered_funcs.size() == 0) {
      LOG(WARNING) << "no lowered funcs exist in the compiled module";
    } else {
      ret_.mod = tvm::build(             <= **Calls into tvm.build()???** 
        lowered_funcs,**
        target_host_,**
        BuildConfig::Current());**
    }
   ...
  }

In contrast, the tvm.build(…) python call is mainly processed via python code python/tvm/driver/build_module.py

...
    fhost_all = []
    device_modules = []
    for tar, flist in target_flist.items():
        fhost, mdev = _build_for_device(flist, tar, target_host)
        # Save the current lowered functions of the host and the device module.
        fhost_all += fhost
        device_modules.append(mdev)

    # Generate a unified host module.
    mhost = codegen.build_module(fhost_all, str(target_host))

    # Import all modules.
    for mdev in device_modules:
        if mdev:
            mhost.import_module(mdev)
    return mhost

If you use graph_runtime or vm, relay will invoke CompileEngine, which will go into tvm.lower in python and obtain a list of LoweredFunc

see compile_engine.cc

    if (const auto* f = runtime::Registry::Get("relay.backend.lower")) 

This calls into python part

1 Like