[Relay] Homogenous Compilation Example


I’m interested in compiling certain layers of a graph for CPU execution and other for GPU. From the Relay API, it seems like this is possible, however, I can’t find any examples showing the proper way to do it. Anyone know where I might find one?


Are you talking about heterogeneous compilation examples? You can find them here:


That is what I was looking for, although I have a related question you might be able to help with. A lot of useful relay features rely on annotations that can be conveniently done from python. However, my main interaction with Relay is through a frontend graph parser that spits out an entire relay function. If I wanted to annotate certain operations in that function what’s the intended method? Do I need to write a full IR pass that includes C++ (similar to quantization)?


@jwfromm Yes, annotation at the expression level from the Relay source is a little annoying if you mainly work on the program obtained from the parser as the graph is usually large. But most of the annotations are for general purpose. If you have some special needs, one thing you can do is probably writing another pass. But again, you can still leverage the available operators and only the way you want to annotate the graph needs to be implemented. Hopefully this is helpful in some sense.


@jwfromm You can just write a simple pass that annotates the program in Python.

import tvm
from tvm import relay
import tvm.relay.testing
from tvm.relay.expr_functor import ExprMutator

class ScheduleConv2d(ExprMutator):
    def __init__(self, device):
        self.device = device

    def visit_call(self, expr):
        visit = super().visit_call(expr)
        if expr.op == tvm.relay.op.get("nn.conv2d"):
            return relay.annotation.on_device(visit, self.device)
            return visit

def schedule_conv2d_on_gpu(expr):
    sched = ScheduleConv2d(tvm.gpu(0))
    return sched.visit(expr)

resnet, params = relay.testing.resnet.get_workload()
resnet = schedule_conv2d_on_gpu(resnet)
resnet = relay.ir_pass.rewrite_annotated_ops(resnet, 0)

[heterogeneous execution] How to use Relay heterogeneous execution in whole model execution?

I ran this example, then tried calling relay.build(resnet, ...) with opt_level=3.

I get the following exception:

TVMError: 103 out of 188 expressions are assigned with virtual device types. Either all or none of the expressions are expected to be annotated.

It seems that there is some assertion that all of the expressions should be annotated. If this is the case, what is the fallback device used for?

After modifying the code to annotate all expressions by returning relay.annotation.on_device(visit, tvm.cpu(0)) instead of visit in the else condition, I get the following exception:

AttributeError: '<class 'tvm.container.Map'>' object has no attribte 'values'

Any guidance would be appreciated!


@jonso Please refer to

I added an example there.


@zhiics I have a follow up question to your commit.

I have a graph that has some custom ops that are only supported on CPU, but other operations will be faster on GPU. Naturally, device annotation is a great solution to this problem.

I pretty much have the exact code in your example except I annotate my custom ops on CPU and set the fallback_device to be gpu. However during relay.build I get an error saying my custom ops dont support the cuda schedule. Why is is it trying to schedule annotated ops on cuda? Does relay try to first schedule using the default then realize the annotation?

Here’s what my code looks like:

net, params = load_my_graph()

class ScheduleCustomOp(ExprMutator):
    def visit_call(self, expr):
        visit = super().visit_call(expr)
        if 'customop' in expr.op.name:
            return relay.annotation.on_device(visit, tvm.cpu())
             return visit

sched = ScheduleCustomOp()
net = sched.visit(net)
net = relay.ir_pass.infer_type(net)

target = {"gpu": "cuda", "cpu": "llvm"}

with relay.build_config(opt_level=3, fallback_device=tvm.gpu()):
    graph, lib, params = relay.build(net, target, params=params)

Running this code give the error:

RuntimeError: schedule not registered for 'cuda'

Looking at the annotated relay function, I see that my custom op is annotated as expected:

%100 customop(%99)
%101 = on_device(%100, meta[relay.attrs.OnDeviceAttrs][3])

Any thoughts on how to solve this problem?


I am not very familiar with writing schedules, but I think you may haven’t registered a cuda schedule for your custom op, so that you cannot really generate code for it.


If I annotate the op to run on CPU, why would it need a CUDA schedule? Does relay try to schedule on both targets before settling on the annotation target?


It seems like there might be a small bug. What’s happening is I have two custom ops one after another, when I annotate both to be on CPU, the first attempts to be scheduled on cuda. If I only annotate the first, everything works fine. The second op is actually the output node of the graph as well so maybe this has something to do with it.


This looks like a bug. You don’t need cuda schedule if the op is assigned to CPU.


It looks to me that the device type of the conv2d weights are not propagated correctly.

Can you try the following to see if it works?

diff --git a/src/relay/pass/device_annotation.cc b/src/relay/pass/device_annotation.cc
index 8eeb493..1645d75 100644
--- a/src/relay/pass/device_annotation.cc
+++ b/src/relay/pass/device_annotation.cc
@@ -491,8 +491,15 @@ class DeviceInfo {

   void FillPropagation(int out_dev_type) {
     for (const auto& it : post_visitor_.post_dfs_order_) {
-        Expr expr = GetRef<Expr>(it.first);
-        if (!it.second) device_map_.Set(expr, out_dev_type);
+      Expr expr = GetRef<Expr>(it.first);
+      if (!it.second) device_map_.Set(expr, out_dev_type);
+      if (const auto* call = expr.as<CallNode>()) {
+        for (const auto& arg : call->args) {
+          if (arg->is_type<VarNode>() || arg->is_type<ConstantNode>()) {
+            device_map_.Set(arg, device_map_[expr]);
+          }
+        }
+      }

If it works, could you please help debug where it goes wrong? Thanks.

[Heterogenous execution] device_type need to be 2, MX-Net model

Your change did fix some of the issues I was having! Specifically, layers that use constant inputs now work great. I’m still running into some problems with convolution layers though. I’ll look into it a little more and see if I can figure out the difference.